Programming > earthli News 3.7

Avoid primary constructors in C# (for now)

2024-04-17T11:01:32+02:00

Published by marco on 17. Apr 2024 11:01:32 (GMT-5)

Updated by marco on 17. Apr 2024 11:14:05 (GMT-5)

tl;dr: avoid C# 12's primary constructors for classes except for very small, simple classes, in which case you should consider using a record instead.

The following video discusses the downsides of the current implementation of primary constructors:

The C# 12 Feature You Shouldn’t Use, Yet by Nick Chapsas (YouTube)

To sum up:

Primary constructors don’t have a readonly backing field; you can still assign to it within the type.
You can’t control the visibility of the generated property or backing field.
You can’t throw exceptions, except in a field-initializer, which isn’t as obvious or clean as doing so from within a standard constructor

Nick contrasts the C# implementation with the language feature in Kotlin, which allows all modifiers in the declaration, but has the same problem that the class definition can get pretty wordy.

The article Primary Constructors – Using C# 12 in Rider and ReSharper by Matthias Koch (JetBrains Blog) describes another ugly phenomenon: double capture.

Let’s consider the following example:
public class Person(int age)
{
    // initialization
    public int Age { get; set; } = age;
    // capture
    public string Bio => $"My age is {age}!";
}
In this class, the parameter age is exposed both through the Age and Bio property. As a result, the object stores the state of age twice! For reference types, a double capture leads to an increased memory footprint and possibly even memory leaks. In our concrete example, you will observe the following unintended behavior:
var p = new Person(42);
p.Age.Dump();   // Output: 42
p.Bio.Dump();   // Output: My age is 42!
p.Age++;
p.Age.Dump();   // Output: 43
p.Bio.Dump();   // Output: My age is 42! // !!!!

Fighting with Fowler on Continuous Integration

2024-03-18T10:56:49+01:00

Published by marco on 18. Mar 2024 10:56:49 (GMT-5)

Updated by marco on 18. Mar 2024 11:18:01 (GMT-5)

The article Continuous Integration by Martin Fowler makes many interesting points. It is a compendium of know-how about CI by one of the industry heavyweights, who’s been using it for a long time.

While I found a lot of what he had to say interesting, I did wonder how applicable CI is for the kinds of teams that I know and work with. He makes several statements toward that end that pretty severely limit the applicability of what he calls “true CI” for many, if not most, teams.

I think he should have started his article with a very clear delineation for which kinds of organizations this kind of process is appropriate or efficient. In leaving it out, he seems to suggest that it’s the best for everyone, but at the end of the article, he lists what are, for me, quite severe restrictions. For example,

“Continuous Integration is more suited for team working full-time on a product […]”
“[…] elite teams deployed to production more rapidly, more frequently, and had a dramatically lower incidence of failure when they made these changes.”
“If a team attempts Continuous Integration without a strong test suite, they will run into all sorts of trouble”

I don’t get the impression that Fowler is discussing a dream scenario toward which one works, but rather what he considers to be the absolute minimum process that anyone should be utterly embarrassed about themselves for not already having. I didn’t see a single sentence in this 40-page, at-times repetitive document about how to actually get there from here—or whether that’s really appropriate for many projects that people who read Martin Fowler might be working on.

I wonder about the wisdom of prioritizing integration seemingly above all else.

Below are citations from the long paper, with my comments interleaved.

“This contrast isn’t the result of an expensive and complex tool. The essence of it lies in the simple practice of everyone on the team integrating frequently, at least daily, against a controlled source code repository. This practice is called “Continuous Integration” (or it’s called “Trunk-Based Development”).”

He says this a lot, but I never hear about the costs. Is there no amount of time lost on integrations that is too high a price? Is there no task that he doesn’t break down into a million pieces in order to accommodate this style of work? Is there no efficiency lost by making each task into 1-hour chunks of coding that the entire team then integrates? Is that what we’re doing now?

“This will consist of both altering the product code, and also adding or changing some of the automated tests. During that time I run the automated build and tests frequently. After an hour or so I have the moon logic incorporated and tests updated.”

I’m quite fed up with reading this kind of optimistic bulls%!t. What kind of programmers are these who can accomplish major work in one hour? Or are the tasks that Fowler can conceive of all so simple that they can be accomplished in an hour? I’m very suspicious about these kinds of statements. It reminds me of game developers in the 90s talking about how they’d “written the whole engine in a weekend”, but then the game still took five more years to deliver.

“Some people do keep the build products in source control, but I consider that to be a smell − an indication of a deeper problem, usually an inability to reliably recreate builds. It can be useful to cache build products, but they should always be treated as disposable, and it’s usually good to then ensure they are removed promptly so that people don’t rely on them when they shouldn’t.”

Sure. But—priorities. Your product is not the pipeline. It’s your product. You can’t make everything a slave to the process. Remember to fix that which you can fix quickly, but to focus on your own priorities. Don’t polish a build so that Martin Fowler is happy, if it’s going to make your customers wait a lot longer for their release.

“The tests act as an automated check of the health of the code base, and while tests are the key element of such an automated verification of the code, many programming environments provide additional verification tools. Linters can detect poor programming practices, and ensure code follows a team’s preferred formatting style, vulnerability scanners can find security weaknesses. Teams should evaluate these tools to include them in the verification process.”

“Everyone Pushes Commits To the Mainline Every Day

“No code sits unintegrated for more than a couple of hours.”

This feels completely divorced from reality, but maybe I just “don’t get it.”

“If everyone pushes to the mainline frequently, developers quickly find out if there’s a conflict between two developers. The key to fixing problems quickly is finding them quickly. With developers committing every few hours a conflict can be detected within a few hours of it occurring, at that point not much has happened and it’s easy to resolve. Conflicts that stay undetected for weeks can be very hard to resolve.”

I agree with the last sentence, but at what cost? It feels like you’re going to spend so much time committing and integrating. How is finding out if you have conflicts the highest-priority task your team has?

“Full mainline integration requires that developers push their work back into the mainline. If they don’t do that, then other team members can’t see their work and check for any conflicts.”

Who finishes anything non-trivial in an hour? I can’t escape the feeling that one-hour chunks is almost too granular, that this size was chosen because it aids integration. While that’s a noble goal, I wonder how appropriate it is for many tasks, and to what degree the shape of the process affects the size of the solution set.

“Since there’s only a few hours of changes between commits, there’s only so many places where the problem could be hiding. Furthermore since not much has changed we can use Diff Debugging to help us find the bug.”

But don’t you waste time hunting bugs that would have gone away by themselves if the process weren’t so frenetic? If you rebase everything, then you’ll still encounter every integration conflict. If you merge, though, you can skip many of those interim integrations because subsequent changes might have obviated prior ones that might have caused conflicts.

Instead of testing occasional version, you end up testing absolutely everything you do as if it were a release candidate. I’m not convinced that there’s no downside to that. I feel like it’s a waste of time if applied so mindlessly.

“Often people initially feel they can’t do something meaningful in just a few hours, but we’ve found that mentoring and practice helps us learn.”

I don’t know who you’re working with, but I wonder how useful is that? How useful is it to tailor your entire process to ruthlessly chopping up your work into tiny segments? What if that’s not how some people work? What if they can’t learn? Fire ‘em?

“Continuous Integration can only work if the mainline is kept in a healthy state. Should the integration build fail, then it needs to be fixed right away. As Kent Beck puts it: “nobody has a higher priority task than fixing the build”.”

You goal ends up being running to run the process, rather than to build the product. This sounds more and more like a cult.

“If the secondary build detects a bug, that’s a sign that the commit build could do with another test. As much as possible we want to ensure that any later-stage failure leads to new tests in the commit build that would have caught the bug, so the bug stays fixed in the commit build.”

“A team should thus automatically check for new versions of dependencies and integrate them into the build, essentially as if they were another team member. This should be done frequently, usually at least daily, depending on the rate of change of the dependencies.”

This seems like another thing that becomes a higher priority than building the product itself. Daily dependency check seems like overkill, but it’s automated, so who cares? He’s just running builds all the time, like we don’t have a climate crisis.

“if we rename a database field, we first create a new field with the new name, then write to both old and new fields, then copy data from the exisitng old fields, then read from the new field, and only then remove the old field. We can reverse any of these steps, which would not be possible if we made such a change all at once. Teams using Continuous Integration often look to break up changes in this way, keeping changes small and easy to undo.”

“Virtual environments make it much easier than it was in the past to do this. We run production software in containers, and reliably build exactly the same containers for testing, even in a developer’s workspace. It’s worth the effort and cost to do this, the price is usually small compared to hunting down a single bug that crawled out of the hole created by environment mismatches.”

I agree with this part, without qualification. At least as a goal.

“Being able to automatically revert also reduces a lot of the tension of deployment, encouraging people to deploy more frequently and thus get new features out to users quickly. Blue Green Deployment allows us to both make new versions live quickly, and to roll back equally quickly if needed, by shifting traffic between deployed versions.”

What about data schemas? What about if you don’t have a product that deploys on a web server or app store? I understand that there are solutions to this, but I wonder how great a fit they are to many teams? If your team is accustomed to SQL programming—or if you already have a suite of products that use SQL databases—then how worthwhile to your business is it to prioritize moving away from SQL to a local DB like SQLite, a NoSQL document store like RavenDB, or even to a completely different back-end like Rama?

“Continuous Integration effectively eliminates delivery risk. The integrations are so small that they usually proceed without comment. An awkward integration would be one that takes more than a few minutes to resolve.”

It sounds like very much like it prioritizes eliminating delivery risk over all else. It is only applicable to products built in this way from the beginning.

“Having to put work on a new feature aside to debug a problem found in an integration test [or] feature finished two weeks ago saps productivity.”

So does constantly integrating, though! It can be noise. It’s like the noise of micro-reviewing AI responses. You have to figure out the sweet spot for your team and iterate toward that goal, always ensuring that your team can deliver even if the dream process is not already in place. Make a diagram of all the facets and discuss a plan for your project. Pragmatic. Realistic.

“They found that elite teams deployed to production more rapidly, more frequently, and had a dramatically lower incidence of failure when they made these changes. The research also finds that teams have higher levels of performance when they have three or fewer active branches in the application’s code repository, merge branches to mainline at least once a day, and don’t have code freezes or integration phases.”

What if you don’t have an elite team?

“A two week refactoring session may greatly improve the code, but result in long merges because everyone else has been spending the last two weeks working with the old structure. This raises the costs of refactoring to prohibitive levels. Frequent integration solves this dilemma by ensuring that both those doing the refactoring and everyone else are regularly synchronizing their work.”

Some refactoring can’t just be done in mini bites like that. Sometimes, you work on a POC that takes more time to verify. Now what? Throw it away and build it from scratch in bite-sized pieces? Or integrate a long-lived branch, which is verboten?

I’m working on a sweeping change to the way solutions are configured. It involves changing packages and versions in four different solutions. Should I have merged to master everywhere and involved the whole team in my project? That sounds stupid. Sure, it takes longer to verify and integrate in one big chunk, but it has the advantage that it didn’t make upgrading the solution format the number-one priority for all developers for a sprint or two.

“[…] teams that spend a lot of effort keeping their code base healthy deliver features faster and cheaper. Time invested in writing tests and refactoring delivers impressive returns in delivery speed, and Continuous Integration is a core part of making that work in a team setting.”

For non-legacy projects. Continuous delivery can only really work for web-based products or apps. A lot of other products have to be deployed to processes that aren’t as easy to update five times a day.

“Continuous Integration is more suited for team working full-time on a product, as is usually the case with commercial software. But there is much middle ground between the classical open-source and the full-time model. We need to use our judgment about what integration policy to use that fits the commitment of the team.”

That is the first time that he’s conceded that maybe there are use cases to which this whole article doesn’t apply very well.

“If a team attempts Continuous Integration without a strong test suite, they will run into all sorts of trouble because they don’t have a mechanism for screening out bugs. If they don’t automate, integration will take too long, interfering with the flow of development.”

No kidding. You need some serious test coverage to continuously integrate and deploy. I also wonder about the size of the product you can legitimately do this. Can you imagine if your test suite takes ten minutes to run and you integrate three or four times per day? Can you imagine how much time you’re not developing software because you’re integrating someone else’s code? I understand that this happens eventually, but I wonder about the wisdom of prioritizing integration seemingly above all else.

“Continuous Integration is about integrating code to the mainline in the development team’s environment, and Continuous Delivery is the rest of the deployment pipeline heading to a production release.”

This is a good definition and I wonder that he rewrote this whole essay and didn’t put this right at the top.

“Continuous Integration ensures everyone integrates their code at least daily to the mainline in version control. Continuous Delivery then carries out any steps required to ensure that the product is releasable to product[ion] whenever anyone wishes. Continuous Deployment means the product is automatically released to production whenever it passes all the automated tests in the deployment pipeline.”

Also excellent definitions that make the distinction clear. Continuous Delivery is the one that many teams could strive for, even if they will never be able to do Continuous Delivery. The question is: at what cost?

“Those who do Continuous Integration deal with this by reframing how code review fits into their workflow.”

Well, that’s an interesting statement. Integration trumps review? Get your code in there and review later? Trust in your tests? Are you kidding me? You should review design, as well as implementation. If everyone’s coding and committing and pushing in hours, when do they review? Is the idea to have people communicate with each other only when they’ve already built something?

Web Interop 2024

2024-02-11T22:33:58+01:00

Published by marco on 11. Feb 2024 22:33:58 (GMT-5)

The article The web just gets better with Interop 2024 by Jen Simmons (Webkit Blog) writes,

“The Interop project aims to improve interoperability by encouraging browser engine teams to look deeper into specific focus areas. Now, for a third year, Apple, Bocoup, Google, Igalia, Microsoft, and Mozilla pooled our collective expertise and selected a specific subset of automated tests for 2024.

“Some of the technologies chosen have been around for a long time. Other areas are brand new. By selecting some of the highest priority features that developers have avoided for years because of their bugs, we can get them to a place where they can finally be relied on.”

When we complain about features that remain unimplemented in browsers, we also have to acknowledge that there’s only so much you can do with a given team. There are problems that are technically easier to solve than others. When we complain, we’re actually more concerned about the prioritization of issues. We want to be able to influence what gets fixed when, rather than just having to passively hope that the manufacturer eventually gets around to it.

That where the Web Platform Tests come in. The Interop 2024 project follows on iterations from 2023, 2022, and 2021, when it all started.

Last year was a banner year. For CSS “Subgrid, Container Queries, :has(), Motion Path, CSS Math Functions, inert and @property are now supported in every modern browser.” For JavaScript, we got “Improved Web APIs include Offscreen Canvas, Modules in Web Workers, Import Maps, Import Assertions, and JavaScript Modules” across all modern browsers.

These are all super-important features. E.g., Import Assertions for JSON import and Modules in Web Workers, which allows modern and modular programming, making it much easier to offload work, as one would with code running directly on modern operating systems.

What’s on the schedule for 2024?

Although there was a lot of progress made on CSS nesting last year, it’s back on the radar this year to finalize the implementations.
@property will similarly be more polished, as the percentage support is still quite low in many browsers.
It’s great to see accessibility improvements for many of these features—like how sub-grids or display: contents affect element order—as this means that we will get sites that are automatically accessible, as long as we build our sites logically.
Improvements to IndexedDB will make it easier to write powerful local-first applications (even though something like Automerge might be a better fit for apps offering concurrent or collaborative editing).
Browser- and standards-level support for popover with anchors is long overdue, as making usable tooltips and popups is an area fraught with custom code and half-baked solutions. It’s nice to see this become an area where you’ll no longer need custom JavaScript.
Relative Color Syntax continues the excellent trend of allowing us to write CSS without the support of a CSS preprocessor. With relative colors, dark/light theming support, CSS nesting, and CSS variables, I can’t think of a reason I would use a CSS preprocessor anymore. I know some people have used them for so much more, but I’ve not done so, so my needs are already covered, even without this extension that allows conversion between colorspaces.
@starting-style will fill a gap in CSS that finally allows sites to indicate how an element will transition from or to display: none.

See the original article for much more detail.

SourceLink and external sources

2024-01-08T09:50:50+01:00

Published by marco on 8. Jan 2024 09:50:50 (GMT-5)

Updated by marco on 9. Jan 2024 11:04:29 (GMT-5)

I published a very similar version of the following article in the DevOps Wiki at Uster Technologies AG. Since nearly all of that post is general knowledge that I would have been happy to find before I started my investigations, I’m sharing it here.

Overview

When we think about navigating or debugging our code, we usually focus on the code we’ve written ourselves—local sources in our file system. IDEs have classically focused on being able to debug and navigate this code.

More and more, though, we’re also interested in navigating and debugging our versioned and compiled dependencies:

Internal NuGet packages
External NuGet packages
The Base Class Library (BCL)

Most of these are available as source code. We would ideally like to be able to navigate and debug that code just as easily as we can our own.

The following sections define file types and terminology, and then explain how these concepts apply to debugging and navigation for external sources. You can also just jump to the sections on producing or consuming packages (especially as relates to authentication for private sources).

Diagram

The following diagram provides an overview of the process of obtaining external packages, along with their symbols and source files. It looks quite complicated, but accommodates the flexibility required by various stakeholders.

Symbol and source acquisition diagram

File types

There are several types of files associated with debugging and navigation:

DLL: The executable code generally only includes executable code (instructions). It can include debugging information, but this is relatively rare.
PDB: The “program database” contains the symbol information for the executable code, which allows the debugger to map instructions back to the source code.

This includes aliases (symbols), file positions, and any other necessary mappings, including references to source code files.

Although invented by Microsoft, the PDB is an industry-standard, platform-independent, and language-independent format. See, e.g., LLVM’s The PDB File Format documentation or the Wikipedia entry.
XML: The optionally generated XML documentation. Some IDEs can use this file to enhance the developer experience while browsing the source code.
*.cs: The original source code

Design Considerations

It’s reasonable to ask why this process is so complex.

Why can’t the `nupkg` just include the `PDB` and the `*.cs` files?

The system was designed for use cases where most sources were closed. That has changed, but the system still reflects the original design choices. The PDB files can also add about 30% to the size of the package. The original use cases preferred to avoid using 30% more space for package downloads that didn’t need the debugging information.

Why aren’t sources included in the PDB?

Again, historically, the use cases were for providing improved stack traces with symbols, but not to provide access to closed sources. Even if the sources are partially open, access may be restricted to only some users of the packages or symbols. Having the IDE request the sources separately allows an additional authorization phase.

What about open-source?

The defaults still reflect the original use cases, which actually represent fewer and fewer packages as time goes on.

These answers aren’t particularly satisfying if your use case happens to be “make a package that has symbols for excellent stack traces and sources for excellent debugging”. At least we now have IDEs that know how to work with this system and there is a lot of automation for producing packages with the desired symbol and source-code support.

Terminology

Debugging

A developer debugs source code by interrupting execution of a program—either manually or by setting breakpoints—and then stepping through the instructions, examining the contents of symbols (variables) to investigate the runtime behavior and operation of the system.

The debugger uses the PDB to allow source-level debugging, i.e. debugging in the original source code. While debugging in “lower” formats is possible, it’s not nearly as reliable as being able to step through the code in the original source code, using the original symbols.

How does the debugger obtain the PDB for a given DLL?

First it searches in the same directory. This is by far the most common location where symbols will be found.
Next, it searches on all known “symbol servers” in the order that they’re declared. All DLLs and PDBs have unique identifiers that make it possible to request and download the correct file.

Once the debugger has the PDB, it has everything it needs—except the source code.

Local sources

If the PDB was generated locally, then it most likely references the source files that are still in the same locations in the file system as when it was built. In that case, the debugger easily finds the source files because they’re just at the paths that are directly referenced by the PDB.

If the PDB was not generated locally or the source-code paths do not match, then there are other tricks to find the source files. Visual Studio allows you to set “Directories containing source code” for the “Debug Source Files”

Solution Debugging Options

External sources

If the sources aren’t available locally, e.g., for a NuGet package, then there is a system called SourceLink that is extremely well-supported in the .NET world that makes it possible to easily download the source files that generated a DLL and that are referenced by its PDB.

Things to be aware of:

The package must have been built with SourceLink enabled (see producing packages).
The sources must be available for download in a known format and structure (e.g., Azure Git Repos).
The IDE must know how to download, cache, and use the sources for debugging or navigation.

If the package does not support SourceLink, but the sources are available, then you can download the sources locally and use the solution-level mapping above to tell the debugger where the source files are. You can also just point the debugger to the top-level folder when it asks for the file’s location, in which case the debugger makes the entry for you.

A developer navigates by requesting the source code for a symbol. For example, if the declared type of a variable in an open source file is the class Setting, then the developer can ask the IDE to show the source of Setting by Ctrl + clicking, by pressing F12 in Visual Studio, or by pressing Ctrl + B in Rider.

As with debugging, navigating local sources is straightforward, since the sources are in the local file system. For symbols in NuGet packages, the IDE has to be clever enough to download, cache, and use the sources.

Visual Studio on its own does not support navigating to external sources via SourceLink. Instead, it always decompiles external sources, as shown in the example below.

Visual Studio decompiles external sources on navigation

If you have ReSharper installed, then the default setting is to try as hard as possible to avoid showing a decompiled version.

You can also add “Folder Substitutions” in the “Advanced Symbol options…” for navigating to “External Sources”. The option does not seem to be available in Rider.

ReSharper External Source Options

SourceLink

SourceLink is a system that provides source files for external sources like NuGet packages for debugging or navigation. In order for this to work, you must be able to provide external sources or the client is not properly configured for debugging.

See below for troubleshooting information, especially as relates to authentication for packages and source code pulled from authenticated locations.

Decompiled code

A decompiled version of the source code is a reconstruction of the original source from the instructions and information in the DLL and PDB. When sources cannot be located for a given symbol, Visual Studio, ReSharper, and Rider will produce a decompiled version as a fallback.

This is often good enough to be able to read the code reasonably well, but it leaves certain common constructs in their “lowered” format. E.g., calls to extension methods appear as static-method calls rather than as targeted on the first parameter.

This can make debugging difficult, as the instructions don’t match the mapping. Rider has support for patching the PDB on-the-fly to allow more comfortable debugging of decompiled sources. This is, however, a fallback solution for external packages over which you have no control. It’s best to configure your packages to publish with symbols and sources available to IDEs that support them, as shown in the next section.

Producing packages

The documentation to Enable debugging and diagnostics with Source Link is thorough and tells you all you need to know about all of the options.

If you’re working with Azure DevOps Services, you should include the following package reference:

With this, you’re all set. The package is published to the Azure Artifacts, with a corresponding snupkg available on the Azure symbol server and sources available via the repository URL (subject to authorization; see below for troubleshooting).

Additional Properties

You can set a few optional properties, detailed below. Most projects won’t need to set these, but they are included to spare you the research if you see them in code examples, either in your institution’s code or online. As noted, the only line you need is the package reference shown above.

EmbedAllSources: Embeds all project source files into the generated PDB
EmbedUntrackedSources: Embeds anything that’s not included in source control (kind of unclear what they’re talking about here, though); included in IncludeSymbols
PublishRepositoryUrl: Ensures that the URL of the repository supplied by the CI server or retrieved from source control manager is available in the package information. This is off by default to prevent discovery of private URLS, but it doesn’t really matter for packages published from private sources, as they are protected by Azure DevOps (or whatever) authorization.
IncludeSymbols: Indicates that the PDB should be generated and included either with the package (if DebugType is set to embedded) or in a separate symbol package (if SymbolPackageFormat is set to snupkg). This is implied when the NuGet package Microsoft.SourceLink.AzureRepos.Git is included, as shown below.
SymbolPackageFormat: Indicates which package format to use. This is set to snupkg when the NuGet package Microsoft.SourceLink.AzureRepos.Git is included, as shown below.

See the SourceLink documentation for more details. Among other details, they also note that projects that target .NET 8 no longer need to include this support explicitly because Azure Repos are supported by default, as detailed in the readme for the SourceLink project.

“If your project uses .NET SDK 8+ and is hosted by the above providers (GitHub, Azure Repos, GitLab, BitBucket) it does not need to reference any Source Link packages or set any build properties.”

Conditional packaging

You can also include the packaging conditionally in the Directory.Build.Targets, as shown below.

See the appendix for Directory.Build.Props and Directory.Build.Targets for more information about which variables and directives are respected in which file.

Consuming packages

For debugging

If a package has SourceLink enabled and you have access to the online repository from which it was built, then to seamlessly debug into that source code, ensure the following:

Disable Just My Code
Check that the PDB is available

For navigation

As noted above, Visual Studio doesn’t support navigating via SourceLink. To browse external sources with JetBrains tools, ensure the following:

Check that the PDB is available
Set up Authentication

Troubleshooting

Symbols not loaded

Disable Just My Code

Once you’re sure that the package supports SourceLink, then you should also make sure that the Just My Code setting is disabled.

When Just My Code is enabled, the debugger skips over any code that doesn’t correspond to source code in one of the local projects.

Is it available?

Does the package you’ve downloaded actually include symbols (a .pdb file next to the .dll file)?
If the PDB is not included with the package, is it available on a Symbol Server?
If it is included, is it being copied into the output folder with the DLL?

If it’s available in the package, but is not being copied to the output folder, then if you’re using .NET 7.0 SDK or higher, you can use the build property named CopyDebugSymbolFilesFromPackages.


  true

Manually load the module

Verify that the symbols for the module you’re trying to debug have been loaded. If they aren’t loaded, you can try to load symbols while debugging. For more details and a screenshot, see Just My Code debugging.

Decompiling rather than downloading

If you’re trying to navigate in code, but ReSharper or Rider keeps decompiling instead of getting the sources from SourceLink, then check your External Sources settings in ReSharper or Rider. Verify that the tool is configured to check for external sources before it tries decompiling.

If the IDE is having trouble authenticating, then you will usually see a decompiled version instead. Sometimes the code is so close to the original that it’s hard to tell; scroll to the top to see if it includes the “decompiled by JetBrains…” header.

Once the IDE has decompiled a source file, it will continue to use this cached copy until you close the tab, or sometimes you have to close and re-open the project. If you’re troubleshooting your way through this setup, then you can temporarily disable decompilation as a fallback, which avoids producing the unwanted source-code variant in the first place.

Authentication fails

Visual Studio uses the authentication associated with the logged-in user that you use to enable the IDE. This can be in a weird state if you’ve recently changed your password or your authentication token is stale or in a non-refreshable state. Try logging out and back in.

JetBrains tools (Rider, ReSharper, DotPeek, etc.), on the other hand, need to be given a token.

Configure from the notification

If the tool shows a notification indicating that authentication has failed, then do the following:

Click Configure on the notification to show a dialog
In the resulting dialog, set:
- User name to your Azure login, e.g., john.doe@example.com
- Token to an Azure PAT (click for instructions on how to create one)
- Press the Test button to verify that it works (you should see OK 200)
- Press Ok to save the credentials

Bug in JetBrains tools

However, there is a bug whereby JetBrains tools fail to show a notification or offer a way to enter credentials. [1] That’s going to look something like this:

Download never completes

It claims that it can download the source, but it never completes. You have to cancel the dialog. If you then look at the ReSharper Output, then you’ll see something like this:

Non-OK HTTP status code

The relevant text is at the end of the third line, which indicates that the request for the source file returned a “Non-OK HTTP status code”.

PdbNavigator: Searching for 'Example.Core.AppConfig.AppConfigKeyAttribute' type sources in C:\Users\john.doe\.nuget\packages\example.core.appconfig\4.1.0\lib\netstandard2.0\Example.Core.AppConfig.pdb
PdbNavigator: File names (1) are inferred for type Example.Core.AppConfig.AppConfigKeyAttribute
PdbNavigator: Downloader: https://dev.azure.com/example/example.Core/_apis/git/repositories/Example.Core.LabInstruments/items?api-version=1.0&versionType=commit&version=8b34c2aa672facd47e835c27152f695fa796a408&path=/Example.Core/DotNetStandard/Example.Core.AppConfig/AppConfigKeyAttribute.cs -> Non-OK HTTP status code

Configure from the Credentials Manager

The most reliable way to fix this is to create the credentials in the Credential Manager. Be aware that you will need to create an Azure PAT (personal access token).

Open Credentials Manager
Switch to Windows Credentials
Scroll until you see JetBrains SourceLink https://dev.azure.com/exampleOrganization

JetBrains entry in Credentials Manager

If you don’t have this entry, then that’s the problem. If you have it, but you still can’t get the sources, then edit the entry to have valid credentials.

To create or edit the record, do the following from the Credentials Manager:

Press Add a generic credential
In the resulting form, set:
- Internet or network address to JetBrains SourceLink https://dev.azure.com/exampleOrganization
- User name to your Azure login, e.g., john.doe@example.com
- Password to an Azure PAT (click for instructions on how to create one)

💡 There is no need to restart the JetBrains tools. You will immediately be able to load sources from SourceLink once valid credentials exist.

Alternative: referencing projects, not packages

As you can see above, although publishing a package is relatively straightforward, there are quite a few stumbling blocks on the way to consuming the package for navigation and debugging. Once you have everything set up and working, it’s great, but … there is still one other drawback.

You can’t edit the code for packages.

This is not optimal. Optimally, we’d like to quickly verify that change to an upstream code would address an issue in downstream code without having to generate new packages. It would be great to just edit the upstream code as if it were part of your downstream solution until you’re sure that the change would address your downstream issue. At that point, you can copy the changes back to the upstream solution (where the dependency is produced), add tests, and produce a new version, being pretty certain that the change is effective.

The shortest possible developer-feedback loop with code in external packages is:

Determine that you need to make a change to code in an external package
Open the solution for that package
Make the change
Build the assembly
Drop it into your output folder (along with the PDB)
Build and run your solution with the updated code

If your package has dependencies or your change in the external package’s solution touches multiple packages, then you can do the following:

Build packages for the solution locally
Set up a NuGet source that points to that folder
Update to the newer versions of the packages and restore from that source
Build and run your solution with the updated code

If it get too complicated to do locally, then you can always commit, push, and have the CI generate new versions of your packages (hopefully with a prerelease version, e.g., 3.2.4-preview2)

The solutions outlined above have a reasonable turnaround time, but sometimes you want to pretend that the external packages are just internal projects instead. This basically entails:

Downloading the project or projects corresponding to the packages that you want to be able to edit
Including those projects into your solution
Replacing the external package-references with project references

At that point, you can edit, debug, and navigate the code as if it were your own.

See the “Project Munging with Tools & PowerShell” section of How to Debug NuGet Packages with Symbols and Source Link Painlessly for a PowerShell script that can help you automate part of this.

Directory.Build.Props and Directory.Build.Targets

MSBuild supports including common configuration in project files. While earlier versions required all configuration to be included explicitly, modern versions include configuration files with special names automatically, greatly simplifying common configuration and reducing clutter in project files.

Mechanics

If the file is named Directory.Build.Props or Directory.Build.Targets, it is picked up automatically and included for all projects in that folder or any subfolder. If you use a different name, then you have to explicitly reference that file from a project or from another *.props or *.targets file. If you choose your own name, you don’t have to use the Build.Properties or Build.Targets convention, but it’s strongly recommended, to avoid confusion.

Directory.Build.Props

You can use a Directory.Build.Properties file to include settings for all projects in a folder or set of subfolders.

For example, the following package reference can and should be included in Directory.Build.Props:

Directory.Build.Targets

If you want to include settings conditionally based on build configuration (e.g., Configuration or Platform), then you’ll have to use the Directory.Build.Targets file, which has access to those variables.

Rules-of-thumb

Once you’re using the SDK-style format for project files [2], you can aggressively consolidate common settings in a Directory.Build.Props file at the root of the solution.
If you have groups of projects to which different settings should be applied, then consider splitting those off into corresponding subfolders (e.g., “Tests”) so that you can apply those common settings with a configuration file that applies only to that folder.
If you can’t or don’t want to move projects into subfolders, then you can create a custom `props` file and manually include it in the project files that need it.

[1] After having figured out a workaround, I felt well-equipped enough to file a bug with JetBrains: ReSharper does not ask for authentication when browsing to source from symbol files in assembly explorer. After a couple of days, the responsible developer changed the status from “triage” to “open” and he linked a two-year-old bug report to it: Pdb files cannot be downloaded from Azure DevOps Symbol Server. Would you like to guess who wrote that bug report? Yours truly. I knew I’d had trouble in this area before, but I’d completely forgotten that I’d reported the bug in such detail. It’s still open. Maybe they’ll finally address it.

[2] This also works for the older project format, but it’s hard to keep Visual Studio from repopulating properties in that format. You can use the SDK-style format for nearly all projects these days. The conversion is worth it.

Learning how to use GenAI as a programming tool

2023-12-30T22:46:09+01:00

Published by marco on 30. Dec 2023 22:46:09 (GMT-5)

The article Exploring Generative AI by Birgitta Böckeler (MartinFowler.com) is chock-full of helpful tips from eight newsletters totaling 25 pages that she wrote throughout 2023. I include some of my own thoughts, but most of this article consists of citations.

A lot of my analysis and notes boils down to: you need to know what you’re doing to use these tools. They can help you build things that you don’t understand, but it’s not for medium- or long-term solutions. I’ve written a lot more about the need for expertise in How important is human expertise?

“The following are the dimensions of my current mental model of tools that use LLMs (Large Language Models) to support with coding.

“Assisted tasks”

Finding information faster, and in context

Generating code

“Reasoning” about code (Explaining code, or problems in the code)

Transforming code into something else (e.g. documentation text or diagram)
“These are the types of tasks I see most commonly tackled when it comes to coding assistance, although there is a lot more if I would expand the scope to other tasks in the software delivery lifecycle.”

“In this particular case of a very common and small function like median, I would even consider using generated code for both the tests and the function. The tests were quite readable and it was easy for me to reason about their coverage, plus they would have helped me remember that I need to look at both even and uneven lengths of input. However, for other more complex functions with more custom code I would consider writing the tests myself, as a means of quality control. Especially with larger functions, I would want to think through my test cases in a structured way from scratch, instead of getting partial scenarios from a tool, and then having to fill in the missing ones.”

“The tool itself might have the answer to what’s wrong or could be improved in the generated code − is that a path to make it better in the future, or are we doomed to have circular conversation with our AI tools?”

“[…] generating tests could give me ideas for test scenarios I missed, even if I discard the code afterwards. And depending on the complexity of the function, I might consider using generated tests as well, if it’s easy to reason about the scenarios.”

“For the purposes of this memo, I’m defining “useful” as “the generated suggestions are helping me solve problems faster and at comparable quality than without the tool”. That includes not only the writing of the code, but also the review and tweaking of the generated suggestions, and dealing with rework later, should there be quality issues.”

[…]

Boilerplate: Create boilerplate setups like an ExpressJS server, or a React component, or a database connection and query execution.

Repetitive patterns: It helps speed up typing of things that have very common and repetitive patterns, like creating a new constructor or a data structure, or a repetition of a test setup in a test suite. I traditionally use a lot of copy and paste for these things, and Copilot can speed that up.

Interesting. I’ve just always used the existing templates or made my own expansion templates. At least then it makes exactly what I want—and even leaves the cursor in the right position afterwards.

Another thought I had is that the kind of programmer that this helps doesn’t use any generalization for common patterns. Otherwise, the suggestions wouldn’t be useful because they can’t possibly take advantage of those highly specialized patterns. Or maybe they can, if they’re included in the context. It seems unlikely, if only because the sample size is too small to be able to influence the algorithm sufficiently. But maybe enough weight can be given to the immediate context to make that work somehow.

At that point, though, you’re just spending all of your time coaxing your LLM copilot into building the code that you already knew you wanted. This practice seems like it would end up discouraging generalization and abstraction—unless it can grok your API (as I’ve noted above).

This is an age-old problem that is maybe solved, once and for all. The problem is that when you generalize a solution, it becomes much easier, more efficient, and more economical to maintain, but it can end up being more difficult to understand. If the API is well-made and addresses a problem domain with a complexity that the programmer is actually capable of understanding, then the higher-level API may be easier to use, and perhaps even maintain.

However, a non-generalized solution is sometimes easier for a novice or less-experienced programmer to understand and extend. It’s questionable whether you’d want your code being extended and maintained by someone who barely—or doesn’t—understand it, but that situation is sometimes thrust on teams and managers.

“This autocomplete-on-steroids effect can be less useful though for developers who are already very good at using IDE features, shortcuts, and things like multiple cursor mode. And beware that when coding assistants reduce the pain of repetitive code, we might be less motivated to refactor.”

“You can use a coding assistant to explore some ideas when you are getting started with more complex problems, even if you discard the suggestion afterwards.”

“The larger the suggestion, the more time you will have to spend to understand it, and the more likely it is that you will have to change it to fit your context. Larger snippets also tempt us to go in larger steps, which increases the risk of missing test coverage, or introducing things that are unnecessary.”

On the other hand,

“[…] when you do not have a plan yet because you are less experienced, or the problem is more complex, then a larger snippet might help you get started with that plan.”

This is not unlike using StackOverflow or any other resource. There’s no getting around knowing what you’re doing, at least a little bit. You can’t bootstrap without even a bootstrap.

“Experience still matters. The more experienced the developer, the more likely they are to be able to judge the quality of the suggestions, and to be able to use them effectively. As GitHub themselves put it: “It’s good at stuff you forgot.” This study even found that “in some cases, tasks took junior developers 7 to 10 percent longer with the tools than without them.””

“Using coding assistance tools effectively is a skill that is not simply learned from a training course or a blog post. It’s important to use them for a period of time, experiment in and outside of the safe waters, and build up a feeling for when this tooling is useful for you, and when to just move on and do it yourself.”

This is just like any other tool. There is no shortcut to being good at something complex. The only tasks for which there are shortcuts are the non-complex ones. In that case, you should be asking yourself why your solutions involve so much repetitive programming.

“We have found that having the right files open in the editor to enhance the prompt is quite a big factor in improving the usefulness of suggestions. However, the tools cannot distinguish good code from bad code. They will inject anything into the context that seems relevant. (According to this reverse engineering effort, GitHub Copilot will look for open files with the same programming language, and use some heuristic to find similar snippets to add to the prompt.) As a result, the coding assistant can become that developer on the team who keeps copying code from the bad examples in the codebase.”

That will be so much fun, especially if you can get an echo chamber of lower-skilled programmers approving each other’s pull requests. 😉

“We also found that after refactoring an interface, or introducing new patterns into the codebase, the assistant can get stuck in the old ways. For example, the team might want to introduce a new pattern like “start using the Factory pattern for dependency injection”, but the tool keeps suggesting the current way of dependency injection because that is still prevalent all over the codebase and in the open files. We call this a poisoned context , and we don’t really have a good way to mitigate this yet.”

“Using a coding assistant means having to do small code reviews over and over again. Usually when we code, our flow is much more about actively writing code, and implementing the solution plan in our head. This is now sprinkled with reading and reviewing code, which is cognitively different, and also something most of us enjoy less than actively producing code. This can lead to review fatigue, and a feeling that the flow is more disrupted than enhanced by the assistant.”

“Automation Bias is our tendency “to favor suggestions from automated systems and to ignore contradictory information made without automation, even if it is correct.” Once we have had good experience and success with GenAI assistants, we might start trusting them too much.”

“[…] once we have that multi-line code suggestion from the tool, it can feel more rational to spend 20 minutes on making that suggestion work than to spend 5 minutes on writing the code ourselves once we see the suggestion is not quite right.”

“Once we have seen a code suggestion, it’s hard to unsee it, and we have a harder time thinking about other solutions. That is because of the Anchoring Effect, which happens when “an individual’s decisions are influenced by a particular reference point or ‘anchor’”. so while coding assistants’ suggestions can be great for brainstorming when we don’t know how to solve something yet, awareness of the Anchoring Effect is important when the brainstorm is not fruitful, and we need to reset our brain for a fresh start.”

“The framing of coding assistants as pair programmers is a disservice to the practice, and reinforces the widespread simplified understanding and misconception of what the benefits of pairing are.”

“Pair programming however is also about the type of knowledge sharing that creates collective code ownership, and a shared knowledge of the history of the codebase. It’s about sharing the tacit knowledge that is not written down anywhere, and therefore also not available to a Large Language Model. Pairing is also about improving team flow, avoiding waste, and making Continuous Integration easier. It helps us practice collaboration skills like communication, empathy, and giving and receiving feedback. And it provides precious opportunities to bond with one another in remote-first teams.”

“LLMs rarely provide the exact functionality we need after a single prompt. So iterative development is not going away yet. Also, LLMs appear to “elicit reasoning” (see linked study) when they solve problems incrementally via chain-of-thought prompting. LLM-based AI coding assistants perform best when they divide-and-conquer problems, and TDD is how we do that for software development.”

“Some examples of starting context that have worked for us:”

ASCII art mockup

Acceptance Criteria
Guiding Assumptions such as:

“No GUI needed”

“Use Object Oriented Programming” (vs. Functional Programming)

“For example, if we are working on backend code, and Copilot is code-completing our test example name to be, “given the user… clicks the buy button ” , this tells us that we should update the top-of-file context to specify, “assume no GUI” or, “this test suite interfaces with the API endpoints of a Python Flask app”.”

“Copilot often fails to take “baby steps”. For example, when adding a new method, the “baby step” means returning a hard-coded value that passes the test. To date, we haven’t been able to coax Copilot to take this approach.”

Knowing a bit about how LLMs work, there’s no way you really could train it to do TDD, because it’s an iterative process. It doesn’t know what TDD is, nor does the way it’s built have any mechanism for learning how to do it. Nor does it know what coding is, for that matter. It’s just a really, really good guesser. Everything it does is hallucination. It’s just that some of it is useful.

“As a workaround, we “backfill” the missing tests. While this diverges from the standard TDD flow, we have yet to see any serious issues with our workaround.”

Changing how you program because of the tool is something you should do deliberately. This is a slippery slope.

“For implementation code that needs updating, the most effective way to involve Copilot is to delete the implementation and have it regenerate the code from scratch. If this fails, deleting the method contents and writing out the step-by-step approach using code comments may help. Failing that, the best way forward may be to simply turn off Copilot momentarily and code out the solution manually.”

Jaysus. That’s pretty grim.

“The common saying, “garbage in, garbage out” applies to both Data Engineering as well as Generative AI and LLMs. Stated differently: higher quality inputs allow for the capability of LLMs to be better leveraged. In our case, TDD maintains a high level of code quality. This high quality input leads to better Copilot performance than is otherwise possible.”

“Model-Driven Development (MDD). We would come up with a modeling language to represent our domain or application, and then describe our requirements with that language, either graphically or textually (customized UML, or DSLs). Then we would build code generators to translate those models into code, and leave designated areas in the code that would be implemented and customized by developers.”

“That unreliability creates two main risks: It can affect the quality of my code negatively, and it can waste my time. Given these risks, quickly and effectively assessing my confidence in the coding assistant’s input is crucial.”

“Can my IDE help me with the feedback loop? Do I have syntax highlighting, compiler or transpiler integration, linting plugins? Do I have a test, or a quick way to run the suggested code manually?”

“I have noticed that in CSS, GitHub Copilot suggests flexbox layout to me a lot. Choosing a layouting approach is a big decision though, so I would want to consult with a frontend expert and other members of my team before I use this.”

That’s because you care about architecture. Review was always important, but more so when code is being written by something you never hired.

“How long-lived will this code be? If I’m working on a prototype, or a throwaway piece of code, I’m more likely to use the AI input without much questioning than if I’m working on a production system.”

“[…] it’s also good to know if the AI tool at hand has access to more information than just the training data. If I’m using a chat, I want to be aware if it has the ability to take online searches into account, or if it is limited to the training data.”

“To mitigate the risk of wasting my time, one approach I take is to give it a kind of ultimatum. If the suggestion doesn’t bring me value with little additional effort, I move on. If an input is not helping me quick enough, I always assume the worst about the assistant, rather than giving it the benefit of the doubt and spending 20 more minutes on making it work.”

“GitHub Copilot is not a traditional code generator that gives you 100% what you need. But in 40-60% of situations, it can get you 40-80% of the way there, which is still useful. When you adjust these expectations, and give yourself some time to understand the behaviours and quirks of the eager donkey, you’ll get more out of AI coding assistants.”

AOT, JIT, and PGO in .NET

2023-12-15T13:15:17+01:00

Published by marco on 15. Dec 2023 13:15:17 (GMT-5)

The latest video by Nick Chapsas has a more-than-usually clickbait-y headline. The “big” problem that NativeAOT has, is that it’s 4% slower during runtime than the JIT-compiled version.

NativeAOT in .NET 8 Has One Big Problem by Nick Chapsas (YouTube)

That doesn’t seem like such a big problem to me, when the point of AOT is to improve cold-start times for applications launched on-demand. For that use-case, AOT shines. It’s over 4x faster on startup than the JIT-compiled version. It’s incredibly impressive that JIT-compilation takes less than 1/10 of a second, but it’s still 4x slower than AOT.

Spider graph of AOT vs. JIT

So, you get the app started 4x fast, but it then performs 4% more slowly than the non-AOT version. It really depends on the use-case, but it’s great for the common one of starting a server to answer a function call—think Azure Functions or AWS Lambdas—and then shutting down again, possibly immediately.

Damian P Edwards (Principal Architect at Microsoft) commented on the post,

“[There are a] few things that cause the slightly lower performance in native AOT apps right now. First (in apps using the web SDK) is the new DATAS Server GC mode. This new GC mode uses far less memory than traditional ServerGC by dynamically adapting memory use based on the app’s demands, but in this 1st generation it impacts the performance slightly. The goal is to remove the performance impact and enable DATAS for all Server GC apps in the future.

“Second is CoreCLR in .NET 8 has Dynamic PGO enabled by default, which allows the JIT to recompile hot methods with more aggressive optimizations based on what it observes while the app is running. Native AOT has static PGO with a default profile applied and by definition can never have Dynamic PGO.

“Thirdly, JIT can detect hardware capabilities (e.g. CPU intrinsics) at runtime and target those in the code it generates. Native AOT however defaults to a highly compatible target instruction set which won’t have those optimizations but you can specify them at compile time based on the hardware you know you’re going to run on.

“Running the tests in [the] video with DATAS disabled and native AOT configured for the target CPU could improve the results slightly.”

To summarize:

The DATAS GC mode is in-use for AOT, but still being fine-tuned.
An AOT-compiled app cannot benefit from dynamic PGO. It benefits from static PGO, but cannot recompile itself on-the-fly because it doesn’t have a JIT compiler to do so.

The JIT-compiled app can dynamically recompile what it observes as performance hotspots with more highly optimized code. I wrote a bit about how Safari does something similar for JavaScript in Optimizing compilation and execution for dynamic languages—although for JavaScript, dynamic recompilation is sometimes necessary for backing out of an incorrect assumption about what type a variable is going to have.
As well, a JIT-compiled app can take actual hardware capabilities into account, while an AOT-compiled app necessarily targets a static hardware profile.

The generic hardware profile is going to be extremely conservative about capabilities because if it assumes a capability that doesn’t exist, the app simply won’t run. Choosing a hardware profile for AOT that matches the target hardware would boost performance.

I guess that was more of a rephrasing, rather than a summary.

Anyway, another commenter asked,

“[…] would it be possible in the future for a JIT application with Dynamic PGO that has run for a while and has made all kinds of optimizations to then create a “profile” of sorts that could be used by the Native AOT compiler to build an application that is both fast in startup time and highly optimized for a given workload?”

Yes. That should be possible. It’s unclear what sort of extra performance boost this would give, especially if you’d already fine-tuned the target hardware profile—which is the first thing you should do. I could imagine adding this sort of profiling as a compilation step, though. You always have to be careful, though, whenever you’re running something in production that is different than what you’ve tested. We put a lot of faith in the JIT and dynamic PGO, don’t we?

I wanted to also note that, at the end of the video, Chapsas showed Microsoft’s numbers, which confirm the performance drop, but also show an over 50% reduction in working set! Dude! How do you not mention that!? The app uses less than half of the memory and runs almost as fast? Yes, please! That’s a huge win for people paying for cloud-based services.

For once, I’m somewhat surprised to see how naive Nick’s take is—that a 4% drop in performance is at-all significant, especially when the “slow” version is still processing 50,000 requests per second in a performance-constrained environment. He did mention a trade-off, but was very excited to tell people that AOT is slower during runtime.

There are always trade-offs and you should be very aware of the actual non-functional requirements for your application before you decide whether to use a technology or not. For 99.9% of the applications, the 4% drop in performance vis á vis a JIT-compiled version won’t be the deciding factor. When it’s accompanied by a working set that’s only ½ the size, then it becomes an even more attractive target.

How to replace "warnings as errors" in your process

2023-12-15T11:52:23+01:00

Published by marco on 15. Dec 2023 11:52:23 (GMT-5)

Updated by marco on 15. Dec 2023 12:23:31 (GMT-5)

A build started started failing after a commit. However, the errors had nothing to do with the changes in the commit. A little investigation revealed that the cloud agent had started using a newer version of the build tool that included an expanded set of default warnings. These warnings started appearing first on CI because developers hadn’t had the chance to update their tools yet.

The “warnings as errors” setting turned what would have been a build with a few extra warnings into a failing build that prevented a developer from being able to apply completely unrelated changes. The setting allowed new, unrelated, and irrelevant warnings to push their way to the top of the priority queue.

👉 tl;dr: I don’t think we should use the “errors as warnings” setting anymore. You can get the same benefit —and even more—by using newer, more finer-grained configuration options.

What are we trying to accomplish?

This section wasn’t included in my original draft of this essay. It only occurred to me under the shower that this is the real reason why I wrote a ten-page essay to answer a teammate’s question in a PR review.

In hindsight, it’s obvious: to answer whether we should re-enable the “warnings as errors” setting, we should first think about what doing so would accomplish. What need does it fulfill?

The rest of this essay meanders drunkenly along a path toward what I hope is a reasonable answer.

My team doesn’t care about warnings

I understand the sentiment. You’re in a team that never, or rarely, looks at warnings. You’ve given up on teaching them how to look at warnings and keep them fixed. Fine. You just make every warning an error and now they absolutely have to fix everything. Problem solved.

My team now only cares about warnings

Except it isn’t, is it? Not really.

What you’ve now done is ensured that your team will be constantly fixing errors that aren’t really errors at times when they wouldn’t want or need to be doing so.

Don’t make me waste time pretty-printing code that I’m still writing! How annoying is it when you can’t run a test because your comment has an extra line below it? Are you kidding me? [1]

[1] There are ways to configure formatting automatically to reduce incidences of these. Those ways are discussed a bit below.

How much do you trust your team?

If your team does care about warnings, then, … why do you need to make them errors?

Before handcuffing developers with a setting, think about whether there isn’t a trust problem first. Are you addressing a symptom rather than the cause?

While it’s possible that applying handcuffs is the best possible solution in your case, consider that there are other solutions along a spectrum that goes from “enforcing discipline” to “relying on individual discipline”.

Any feature that’s enforced at all times will end up hampering efficiency and flexibility in some cases, while any feature that’s left up to developers is liable to not be applied consistently.

The job of the person setting up code-style configuration is to thread that needle, tailoring the configuration for the team and solution at hand.

If you have a lot of solutions and teams, then you also get to consider the maintenance overhead of having too many custom configurations. In that case, you might want to make a few standard bundles that group teams and solutions, like “legacy”, “modern”, “junior team”, etc.

You don’t have to name them like that, but the name should give you an idea of how loose or restrictive the settings would be.

Let the CI do it, then

I don’t have time for all of that. Let’s just run them on the CI. Warnings as errors in the cloud FTW!

Now you’re allowing team members to push all the way up to the server before they realize that they have errors. Granted, they’re actually warnings, but you can’t merge to master until you fix them, so, yeah, they’re errors. This isn’t less annoying.

But, but, but, what if they’re, like, real warnings? Like “possible NullReferenceException” or something like that? That’s a good point, sure.

But, in most cases, it’s something more like “extra line found at end of file”, “space missing after parenthesis”, “method can be made private”, “class should be internal”, etc.

There are better—more automated—ways of addressing some of those, which we’ll discuss below.

The CI is not necessarily stable

Also, what if some warnings start appearing in your CI because of a tooling change? That can never happen, though, right? Because you’ve locked down all of your tool versions so that it can never happen? No? You didn’t do that? You’re using “latest”? Why?

The people building the tools are pretty clever, so we want to know what new things they have to tell us about our code.

Oh, right. Because it makes sense. If you lock down your tool versions, you run the very real risk of not knowing when your build will stop running with more-modern tools. You run the risk of it having been years since you last changed anything in your build and your being stuck with those settings and old tools … until they’re obsolete or no longer available on your build server.

It’s better to use “latest” and have an occasional spike of warnings than to just never know where you stand with newer toolchains. Locking down tool versions leads to things like DevOps having to set up on-site build agents with Visual Studio 2010 on them for certain projects.

OK, so we want to use latest tools, but that means that we might also get new warnings. These are a good thing! The people building the tools are pretty clever, so we want to know what new things they have to tell us about our code.

The future broke the build

What we don’t want is for those new things to break builds that used to be running just fine.

This usually shows up when someone pushes new commits, runs the CI, and sees that they’re getting errors that they didn’t see locally. WTH? “My code didn’t cause those errors?”

The drawback here is this is (A) annoying and (B) it’s very possible that the new errors are a distraction at this point in time. The person’s bug fix may be important, but the new warnings have now bumped themselves to the top of the priority queue!

And what if the person whose build has failed isn’t well-qualified to address these new warnings? Well, then they get to bump the new warnings to the top of someone else’s priority queue! Probably a more senior developer. Fun for all!

What’s the solution then? Well, if you realize that the new warnings appeared because of a tool change, then I suppose you should try to pin the tool version on the CI, with all of the drawbacks outlined above.

That’s assuming that the person to whom this happens is (A) capable of figuring this out and (B) knows how to pin the tool version. And (C) we don’t really like that solution, for the reasons outlined above.

What were the requirements again?

What about if we think again about what we’re trying to accomplish with “warnings as errors”?

Thinking…🤔🤔🤔…

The system must allow individual configuration of severity.: We want certain warnings to be errors.
The system must not require all team members be capable of configuring it.: We want clever tool people to configure things for maximum developer comfort, warning visibility with everyone having to become a clever tool person (which isn’t generally possible).
The system must be configurable per solution.: Each solution should be able to decide what is an error and what is a warning and what is a suggestion. You can’t make “possible null-reference exception” an error in some legacy solutions without completely killing forward progress.

We want warnings to indicate potential problems, but be careful about forcing a solution to address all of them immediately. It’s more realistic to create tasks to slowly eliminate warnings, only switching a setting to an error later, to prevent future transgressions.
The system must have a versioned configuration.: We want the configuration to grow with the project, so older versions have their own configuration, with which they can still be run when needed.
The system should use default settings wherever possible: No matter how clever your own team’s tool person is, the people who designed the whole thing are probably even better. Apply configuration changes knowingly and judiciously.
The system should not force a developer to change priorities unnecessarily.: If the developer is focused on something, they shouldn’t be forced to switch modes and prioritize formatting. Use gentle, visible hints, unless it’s really, really relevant to what they’re working on.

For example, a possible NullReferenceException is something to be avoided, but is it really an error in all code? It’s definitely a warning, but if the developer knows that it doesn’t matter right now, then they should be able to ignore it, no?

I mean, they haven’t even committed it yet (as far as you know 😉). Maybe they have a breakpoint to see how the heck that variable could be null in the first place and they were just going to bounce the EIP past the crash anyway. YOLO.

Anyway, we want to be really careful about how pushy we are with the IDE configuration. We want to strike a balance between missing actual problems and decreasing efficiency. We don’t want the developer above to have to write a suppression—or, even worse, do some other, ad-hoc short-circuit of inspections—in order to keep working.
The system should be future-proof: We don’t want running builds to run stop running just because we’ve upgraded tools, but made no changes to the tools. This won’t always be possible but, in this case, the “warnings as errors” setting is a pretty obvious footgun.
The system should provide developers everything they need locally to avoid CI failure: Something should fail only on CI as a last resort. That is, a developer must have tools that make it relatively easy to pass CI. This includes being able to see all warnings in the solution, whether warnings would fail the CI, or having an easy way to apply formatting to all files, if incorrect formatting would fail the build.

We want to avoid a process that leads to half of our commits being called “fix formatting” and “remove warnings”. So, we should consider things like having the IDE auto-reformat files on save.
The system should discourage allowing inspection violations to be committed: Inspections should be applied and made visible as quickly as possible, to give the developer the opportunity to produce conforming code from the get-go. The path of least resistance should result in committing code that will also pass CI.

We don’t want to encourage “noisy” commits that “fix up” formatting or other inspection violations. We would rather have a high signal-to-noise ratio in our commits. We want compact, descriptive commits—so we don’t want bug-fix commits to include formatting changes to other parts of the file, if we can avoid it.

Looking at these requirements, we have to conclude that the “warnings as errors” configuration option is an absolute cudgel that we had to use in the old days because we didn’t have fine-tuned control of the inspection-configuration.

We are no longer in the dark ages

Can we do better today, with modern tools?

Absolutely, we can! Most modern IDEs support .editorconfig, which allows fine-tuned configuration of both code-style and formatting, especially for languages like C# and TypeScript/JavaScript. The wide variety of JetBrains, Intellij-based tools use it as well, e.g. PyCharm, WebStorm, or PHPStorm. Visual Studio understands it. Visual Studio Code understands it.

Of course, the devil is in the details and, the degree to which code-inspection configuration applies from one IDE to another depends very much on the level of standardization for that language and environment. The .NET/C# world has a high degree of standardization, which is very helpful.

Using EditorConfig

EditorConfig allows you to control almost anything you can think of about your code style or formatting. These are called inspections, each of which you can configure with an inspection-specific value and a severity to assign when the inspection is triggered.

For example:

dotnet_style_require_accessibility_modifiers = for_non_interface_members:silent
dotnet_style_prefer_auto_properties = true:silent

The two inspections above should be relatively obvious. In both cases, the preferred setting is configured, but the severity is “silent”, so the IDE doesn’t complain about it.

What’s the point of configuring a preference and then not showing it to the developer?

Ah, because the developer is the not the only one modifying the code.

Excuse me?

The IDE also writes code

Don’t forget that the IDE will auto-format the code when requested. The IDE also writes code when it refactors anything. It needs to know how to format the code that it’s inserting or modifying.

The IDE uses the configuration in the EditorConfig to determine how to format the code. Your tools guy can configure the EditorConfig to conform to the style that the solution / team wants to use. When the code is auto-formatted or refactored, everything should end up looking just the way they wanted it.

How to apply “silent” inspections?

If you have a “silent” severity, that means it’s something that you don’t want the team wasting time with during development. However, if no-one ever auto-formats the code, then those inspections will never be applied.

You should consider the process by which your solution will be made to conform with silent inspections in the EditorConfig.

Visible inspections: If the inspection severity is suggestion or higher [2], then the developer sees an indicator in the code when the file is open.

Suggestions, warnings, and errors are shown in the build output, as well. Of course, the developer can disable showing warnings and messages (where suggestions appear) in the error-list pane, but you can’t control everything—and you shouldn’t try.

Give your developers the tools and configuration to be efficient and produce good code, but try not to be too pushy about when they do it.
Invisible inspections: If the inspection severity is silent or none, then the inspection setting is only used by auto-formatting and refactoring tools.

In this case, you’ll have to consider when will your code be formatted? Do your developers occasionally auto-format files? Do they auto-format on save? Is there a step in the CI that auto-formats everything before compilation? If so, does it commit those changes? Or does the CI reject for formatting warnings?

If you have silent inspections, be honest about when they’re going to be applied. If you don’t have a plan, then they will be applied seemingly randomly when someone inadvertently triggers the hotkey for auto-formatting a file [3], which may lead to unpleasant surprises and/or messy commits.

Code Style vs. Formatting

Let’s clear up the distinction between these two main groups of inspections.

Code style: A code-style inspection expresses a preference for something that affects semantic content. That is, applying the fix for the inspection may change the code in a way that makes it compile differently or may lead to it not compiling at all. Even something as subtle as using var instead of an explicit type can, in very rare cases, lead to code that no longer compiles. By now, many IDE tools are generally clever enough to avoid even suggesting such a change, but it can still happen.
Formatting: A formatting inspection expresses a preference that affects only syntactic content. That is, applying the fix for the inspection will change the appearance, but will not change how the code compiles. Consistent formatting is very important for direct readability of the code, but also to avoid spurious differences in files when inspecting commits. The fewer differences there are, the less likely it is that conflicts appear when merging or rebasing.

Local vs. CI

So we’ve examined inspections in detail and talked a lot about setting severity to optimize the developer feedback loop i.e., we don’t want to mess with a developer’s priority queue unless absolutely necessary.

But aren’t there some things that we might allow a developer to do locally but not allow to pass CI? That’s where the “warnings as errors” setting ensured that the CI never passed, even if the developer forgot to check something locally. For example, it’s important to have consistent formatting before attempting a merge.

There are other ways to encourage and support proper coding practices, though.

Auto-formatting files: Most IDEs have an option to run formatting or code cleanup automatically when files are saved. Consider setting this option in your solution. Developers will become accustomed to having the IDE lightly reformat even their WIP code and they will always have correctly formatted files.
Local pre-commit hooks: Pre-commit hooks can run locally, running global formatting on the code base before a developer can commit. This is kind of touchy, as sometimes developers are just committing a WIP to avoid losing their changes. It would be annoying if you had to clean up your formatting just to commit those.

You could include auto-formatting in the commit hook, but it’s probably better to set up auto-formatting in the IDE.
Server pre-commit hooks: Instead of a local pre-commit hook, you can configure a pre-commit hook on the server. This hook could cause a push to be rejected if its head commit doesn’t conform to certain conditions.

But…isn’t that what the CI is for? Well, kind of, but the CI runs only after the commits have landed on the server. It’s prefereable to have the developer fix commits locally before being able to push, again, to avoid “fix formatting” and “cleanup warnings” commits.

You could choose which branch patterns to run these on.

My recommendation is to lean as heavily as possible on IDE configuration before getting lost in the weeds with commit hooks.

Avoiding ugly or “noisy” commits

As soon as we start talking about “fixes” for warnings or formatting, we’re talking about “noisy” commits. If we enforce inspections more strictly on CI than we do locally, then there will be more “fixup” commits.

OK, so what do we do about them?

Squash ‘em!

Right? Right?

🫠

Kind of. Look, the PR machinery allows you to merge, rebase, squash-merge, or squash-rebase. That’s OK, but it’s not great. A lot of times, you’ll have four commits that are descriptive and semantically relevant, describing changes that were made, as well as a few commits that address problems that either came up in CI or as part of the review. Don’t you think you should squash those into the four commits and make a clean history instead of just squashing the whole lump into one big hairball?

Or do you think that each PR should have only one commit, equating a branch with a commit (as e.g. plugins like Graphite positively encourage)? I recently wrote PRs suck. Stop trying to fix them. that also touches on the workflow outlined below.

You see how tool configuration affects everything? You have to think about how your team builds PRs, how they review PRs, how they repair PRs after review—or whether they even use PRs.

I would encourage a more real-time review culture, where possible.

Set up the tools so a developer has a good chance of committing conforming code
When a developer has a set of commits they want to push, they ask another team member to review, live, explaining what they did.
The live reviewer can point out any issues in the code and they can repair them together, all without wasting time writing and reading review comments.
Those repairs can be squashed into the appropriate commits before anything’s pushed to the server.
Once on the server, the CI runs. If something still fails, the original developer can squash in fixes and force-push [4] to keep the commits in the PR clean.
You can create a PR to not the integration to master, but then its mostly a formality.

[2]

What’s the problem? Don’t you trust your team members to decide what to do with their own highly ephemeral feature branches?

Allowing force-push encourages team members to care about what the commit history looks like. It give them a tool that allows them to revise their commit history until it tells a coherent story. See Rebase Considered Essential for a longer discussion on rewriting commit history.

Configuring your solution

Phew! So, what have we learned?

There is no one-size-fits-all solution.
Errors are priority interrupts. The developer cannot execute until an error is fixed. Be judicious about which inspections you promote to errors.
“warnings as errors” is an inelegant configuration option that promotes any warning to an error. That’s not very judicious at all. It has been replaced by a much more flexible and granular system in EditorConfig.
Encourage a culture in your team that pays attention to warnings. There will always be team members who do more than others. That’s just life. Don’t force those team members least equipped to deal with them to address warnings by making them errors. Baby steps.
Silent inspections will only be applied when the tools apply them, e.g., with auto-formatting or refactoring. Consider when those inspections will be applied.
Keep what’s executed locally as close as possible to what’s executed in CI
Consider how you want your whole process to work, right up through integration to the trunk/main/master.

If that all sounds like a lot, well—it is. Building clean, maintainable code is a complex undertaking. There are a lot of tools that can help, but you have to put some time into thinking how you want to use them, and then into configuring them so they help you instead of getting in your way.

It’s a delicate balancing act: to give developers the best chance of (A) producing conforming code in the first place and (B) avoiding “noisy” commits, while (C) not hitting them with priority interrupts irrelevant to what they’re working on. There will be tradeoffs.

Sharing or copying configuration?

Once you’ve set up a couple of solutions, you can just copy/paste the configuration to others as a starting point. Remember, though, that solutions are usually pretty unique. Only consider generalizing or packaging a configuration if you’ve considered that,

It will be more difficult for solutions that use the package to override settings that turn out not be as standard as you thought.
It will be more time-consuming to make changes to the configuration because you have to roll out a new version—e.g., when the underlying tools change—with potentially multiple stakeholder solutions.
This extra work may discourage solutions from improving the shared configuration. Instead, it will languish, with all users annoyed by the same inconsistencies, but no-one willing to do the work to address them.

For these reasons, each solution having its own copy of the configuration is probably better. They can just copy/paste—the horror!—improvements where appropriate. If you’re worrying about configurations drifting out-of-sync, schedule a work item every few sprints that evaluates and possibly re-syncs configurations.

Conclusion

There are always trade-offs. Improving code-quality is an incremental process. So is configuring the tools that support that process. It gets easier with practice. Good luck!

[3]

There is a bit of a mismatch with using .EditorConfig versus the JetBrains-native configuration: JetBrains tools support an additional severity level called “Hint”, which is generally shown as a green squiggly line rather than the blue one for warnings. However, if you set the severity to “hint”, Visual Studio interprets it as a warning, showing it as such in both the IDE and in the build output.

On top of that, JetBrains seems to think that the silent option is called none, although it seems to understand silent well enough.

[4] Probably because of historical reasons, there is a difference between Visual Studio’s Format Document (Ctrl + K, D), ReSharper/Rider’s Reformat Code (Ctrl + Alt + Enter), and ReSharper/Rider’s Cleanup Code (Ctrl + E, C). I can’t tell you exactly which inspections are considered in which mode, but I’ve listed them in order of “number of changes they seem to make to the code”, with Visual Studio’s native command seeming to make the fewest changes.

PRs suck. Stop trying to fix them.

2023-12-15T11:37:03+01:00

Published by marco on 15. Dec 2023 11:37:03 (GMT-5)

Updated by marco on 15. Dec 2023 11:58:51 (GMT-5)

I read through the article Your GitHub pull request workflow is slowing everyone down (Graphite.Dev) with great interest because I, too, am not thrilled about how PRs work. While I agree with the problems Graphite see with PRs, I think they miss other problems—and I don’t like their solution very much.

PRs are, apparently, HUGE

“The single most important bottleneck is PR size − large PRs can make code reviews frustrating and ineffective. The average PR on GitHub has 900+ lines of code changes. For speed and quality, PRs should be maintained under 200 lines—with 50 lines being ideal. To put this in perspective, where giant 500+ line PRs take around 9 days to get merged on average, tiny PRs under 100 lines can make it from creation to landing within hours.”

Holy shit! The average is 900 lines? That’s already using the system completely incorrectly. That’s so wild. It absolutely confirms my theory that PRs are a terrible way of committing code. I already thought they were terrible just because of the limited UI and lack of introspection of what the code you’re reviewing actually does.

PRs don’t encourage starting and running the change to verify that it actually works as advertised. You’re not using any of the tools that you use to develop code to review it. How silly is that? If you load changes into an IDE, you can see how many warnings there are, see if the layout shifts when you format the document, etc. Why would you want to review in a completely different environment? As Robin Williams once eloquently put it, It’s like masturbating with an oven mitt. (YouTube).

Not only that, but people probably aren’t looking at individual commits, so they’re just reviewing 900+ lines at once. The fewer people there are looking at individual commits, the fewer people there will be who make good, individual commits. This is a shame because it would counteract the awfulness of reviewing code in the PR web-UI, at least a little bit.

PR web UIs are not good for reviews

There are far better and more efficient ways of reviewing code than with PR web UIs. Reviewing through a PR web UI should be a fallback that you only use when nothing else is possible.

If you’re in the same time zone and working on the same schedule as the rest of your team, there is absolutely no reason why you should you be using the PR web UI instead of real-time reviews of local commits.

What the current PR machinery does is fool remote, async teams into thinking that they’re reviewing code efficiently. A face-to-face, real-time review will be much more efficient and yield much higher-quality code.

I honestly can’t believe the high pain threshold that some developers have.

If the developer hasn’t pushed yet, then:

Review with the person who wrote it.

If the developer has pushed and is not available for real-time review, then:

Pull.
Open the branch in SmartGit.
Examine the commits.
Launch the solution/project.
Run the tests locally.
Apply your own commits instead of review notes wherever possible.

Yes, you can do this! Why not? You’re both on the same team. It’s a shared code base, not someone’s personal zen garden. Instead of explaining what you would want changed, just make your suggestion in the form of a commit. It’s often more efficient than writing prose.
Add review notes in SmartGit (synced with GitHub, Azure, GitLab, etc.) or use the PR Web UI to add review notes.

You can thank me later.

Errors slip in

“Problems can easily get hidden between the diffs, and reviewers often make assumptions instead of testing to avoid feeling overwhelmed. One particularly interesting finding is that as the size of a PR increases (by number of files changed), the amount of time reviewers spend on each file decreases significantly (for PRs with 8 or more files changed).”

Obviously! But it’s good to measure—this was my intuition. PRs don’t encourage local testing or verification in an environment similar to that which the original developer used.

Dumbing down Git

“By default, every PR is restricted to only 1 commit of <200 lines, keeping changes tightly scoped. This forces developers to consciously limit work to related changes—the registration endpoint PR can’t sneak in unrelated styling tweaks.”

Yikes! I don’t like the sound of that. So you make multiple PRs rather than one PR with multiple smaller commits? Why don’t you just review commits rather than one giant blob? Do you really need to corral each commit into its own branch and PR to force yourselves to actually make useful commits?

Yeeess? 🧐

“Stacking centers around breaking down big feature work into chains of smaller pull requests. Each PR is typically limited to 1 commit focused on an isolated change. This restriction guides developers to consciously make only a single change, squashing and rebasing along the way, instead of cluttering the PR with random unnecessary commits like “typo fixes”.”

This is yet another technique invented to accommodate teams that don’t trust each other, or that contain people who, if they can’t be trained to do better—or don’t understand what better is—probably shouldn’t be programming yet. Instead of teaching team members how to use their tools, they impose an arbitrary rule. What a kindergarten.

Integrate all the time!?

“Unlike Git workflows, where it is easy to neglect staying updated, Graphite centers your workflow around continually integrating with the current mainline state.”

Yikes! I don’t love the sound of that, either. Doesn’t that force you to spend more time on integration that you might have spent working? I understand you don’t want to have long-lived branches, but now you’re just shooting to the other extreme, forcing integration on every pull.

It’s not bad as long as the integrations are automatic, but might not be appropriate for developers who aren’t great at resolving merge conflicts. Even if they know how to deal with them well, might they not waste time resolving conflicts integrating a version of their code that wasn’t at all ready to be integrated?

I understand that this feature follows from the logic of “if you integrate more often, then integration is easier,” but, again, you’re taking agency out of developers’ hands, implicitly not trusting your team members. I don’t like it.

If you have several stacked commits, I wonder how much shuffling there is in the working tree (causing unwanted IDE reloads) during the integration cascade. Are they somehow integrating without touching the working tree? I don’t know that that’s possible.

Go ahead and work on the main branch if you want—I do it all the time—but this should be more of a choice than it sounds like it is.

I remember this…

“This command will add your changes and create a new branch in one motion. You can then continue iterating by creating and stacking additional branches:”

Ah, I see now. They’ve reinvented Mercurial’s patch queues. Everything old is new again.

A really bright and good friend of mine added an extension to Mercurial’s mq decades ago that sounds like it works the same. I remember discussing the technique with him as he was developing it.

Conclusion

I’m a bit worried about two things:

The one-commit-per-branch thing
The auto-integration-cascade

“By cleaning up your PR commit history, you ensure a clear and concise main branch history that makes it easy to see exactly what’s changed over time.”

By enforcing one commit per branch, you dumb everything down.

It does seem that, instead of acknowledging that PR supremacy is stupid, Graphite doubles down, strips branches of most of their functionality by equating them to commits, and uses multiple PRs to force people to review by commit. It seems like a waste.

But, hey, maybe I need to actually try it. I might be missing something.

Still, instead of adding another tool, I think you should use git better.

Set up your local tools—or in-cloud IDEs, whatever—to support building the kind of code you want. See How to replace “warnings as errors” in your process for more information.
Encourage your team members to learn how to use those tools.
Have a review culture centered on real-time reviews where quick fixes and changes can be made before you’ve even pushed anything. This cultivates a culture of respect for commits.
Use PRs only to “stamp” a set of changes and merge them to the trunk.

"Developer experience" is rarely a requirement

2023-11-30T21:23:21+01:00

Published by marco on 30. Nov 2023 21:23:21 (GMT-5)

Updated by marco on 30. Nov 2023 21:43:00 (GMT-5)

The article Some notes on Local-First Development by Kyle Matthews (Bricolage) focuses on a very good trend in app development, but focuses a bit too much on what he calls DX, or developer experience.

“I see “local-first” as shifting reads and writes to an embedded database in each client via“sync engines” that facilitate data exchange between clients and servers. […] The benefits are multiple:”

Simplified state management for developers.

Built-in support for real-time sync, offline usage, and multiplayer collaborative features.

Faster (60 FPS)

CRUD

More robust applications for end-users.

I don’t want to read too much into it, but he did mention end-users only in the last bullet point.

I think the author is focusing too much on the tech and too little on the value. DX is great and all, but it’s about the UX, no? Every app would benefit from realtime updates if it’s cheap and easy to build. Almost every app is multiplayer, if you think about it a bit.

“For almost any real-time use case, I’d choose replicated data structures over raw web sockets as they give you a much simpler DX and robust guarantees that clients will get updates.”

No, my friend. You’ve come to the right conclusion for the wrong reason.

If the tech is solid, if it doesn’t negatively influence debuggability or traceability, if it’s predictable, if operations can be correlated, if you don’t end up limiting your functionality to fit the framework—then go for it.

What I mean is that it’s important that the thought process that leads to the correct conclusion serves all stakeholders. If you’re only doing things because they’re better for developers, then, eventually, you’re going to be deciding against the users.

Be aware of the trade-offs, and be sure all of the stakeholders can live with them. What does good DX translate to for other stakeholders? Easier maintenance? Less complexity? Easier onboarding? The DX is really mostly secondary unless you’re making a framework, in which case it might matter. No-one cares about DX for real-world products. I love good DX, but I’m a developer! As a developer with a lot of experience, I’m forced to admit that it’s not at all a primary goal. Having good DX might lead to other desirable things, but that doesn’t make it directly desirable. Don’t forget that.

Handling long-running projects

2023-11-08T21:50:04+01:00

Published by marco on 8. Nov 2023 21:50:04 (GMT-5)

This is a brilliant interview, in that Oren Eini just talks for about 40 minutes, answering pretty much just one or two questions.

Oren Eini on Building Projects that Endure by Technology and Friends (YouTube)

At one point (I forget where), he says,

“I don’t like unit tests.”

Agreed. I ~~like~~love automated tests. They’re indispensable. But I think unit tests are only useful when you want to focus on a failing integration test. David rightly points out that they’re really good for pinpointing where a problem actually happens, but Eini says that they also “hinder change” because, by their nature, they lock down a lot of the design and implementation.This is absolutely true.

Just to be clear: I think of anything that’s not a unit test as an integration test. I generally like “smaller” integration tests.

It’s probably better to just be agile about it and write them when the situation requires it, i.e., when the cause behind a failing integration test is proving difficult to pin down—or when you’ve determined the cause and you want a direct proof that you’ve fixed the underlying problem.

Ir requires discipline to realize when you need to write more unit tests in order to help pinpoint which component involved in a failing integration test is causing the problem. If you preemptively write all of the unit tests, you’re wasting time that could be better spent elsewhere.

I have had no small amount of success with a large test suite that was mostly integration tests. It ran relatively quickly (10 minutes for 10,000 tests on a reasonably classed developer desktop) and helped me survive three major refactorings.

Architecture is about intent

2023-10-24T22:39:45+02:00

Published by marco on 24. Oct 2023 22:39:45 (GMT-5)

The following video is a talk by Robert Martin “Uncle Bob”, one of the graybeards worth listening to. This video from 2011 is wide-ranging and contains a lot of brilliant advice. It’s stuff that we’ve known for a long time now, but every generation of programmers needs to re-learn these things about every 5-10 years. You usually can’t stop people from just reinventing the wheel because who wants to watch videos of or read blog posts written by old dudes, ammirite?

Ruby Midwest 2011 − Keynote: Architecture the Lost Years by Robert Martin by Robert C. Martin in 2011 (YouTube)

At 10:00, he talks about how the top-level architecture of most applications reflects the framework used to implement the web-delivery mechanism rather than the purpose of the application itself. In his example, he shows how a Ruby-on-Rails application is immediately recognizable as such, but that you have literally no idea what the application does.

He urges us to consider what this implies about our priorities as architects and developers. It means that we are much more concerned with the technology than with the functionality. This is not good.

He contrasts it with a high-level. 2-d blueprint of the first floor of a church, where the intent is obvious: it’s a church (he says). Of course, inferring that it’s a church involves applying the appearance of the diagram to a given context—e.g., a very western one—but the point is clear: the standard, top-level view of the design of a church screams out that it’s a church. It says nothing about how the church is to be built—or has been built—it says what it is.

“Architecture is about intent.”

Just to be clear: this presentation is from 12 years ago, and we’re still confronted with the same concepts—still confronted with the same failure to remember these precepts. Our frameworks still push themselves to the fore.

This is, in a way, the problem with LLM-generated code: we are already terrible at expressing the intent of our software in a way that makes it maintainable and qualitative. We are already mostly terrible at designing and building things in a way that satisfy the nearly-always-implicit non-functional requirements, like maintainability, usability, performance, etc.

And now we’re asking another piece of software, whose workings we can’t yet fathom, but which we know we’ve built by feeding it all of these terrible versions of software, and asking it to write software for us. All of the theory that we’ve developed about how to build software will not be respected, except by luck, if the neural net is feeling like that’s a high-probability next token.

On the one hand, I have to admit that this doesn’t sound much different from how software is built today, except that the human builders are potentially capable of following rules, whereas the software-based builders are less trainable. Again, though, we have decades of experience showing that, while people are ostensibly trainable, they are not necessarily practically trainable, at least in the general case for the general type of person who takes part in this field of endeavor we call programming.

Which leaves us with the question: have we achieved the maximum potential in software development? We already knew everything we needed to know about how to do it decades ago. What is missing is the will to do it that way. It’s definitely possible to train people to do it that way. The hangup is, as always, the cost, specifically, the cost-benefit ratio. The perceived benefit of better software is usually far less than the perceived (initial) cost.

And we always perceive only the initial cost because we are super-bad at long-term thinking about complex problems like building software.

At 34:00, Uncle Bob says

“There’s gotta be some better way to do this. […] This is just 3270 programming poisoned with all sorts of crud. How many languages do you have to do know to write a web application? Well, there’s some programming language, but that’s incidental! You’ve gotta know HTML and CSS and JS and Zazzle and Dazzle and … and, you know, the guy over here’s going: ‘let’s build communities by leveling people up. Leveling them up! I mean, what we’re going to do is hand them a … OK, now, hold this hammer. Ok? Good. You got that hammer? Now, here’s another one. Hold that hammer too. Now I’ve got a big barrel you’ve got to hold on your head. We are not helping our cause with this truly terrible mechanism that we have adopted.”

At 41:00, he says

“The database is a detail.”

This reminds me of The UI is an afterthought, a detail, an article I wrote recently [1] about a 7-year-old video I watched that expressed the same sentiments about external systems that Martin is expressing in his 12-year-old video.

“That’s what architecture is: find some place to draw a line and then make sure every dependency that crosses that line goes in the same direction.”

At 55:45, he says,

“There’s an interesting case of the database—the thing that’s so incredibly important—and yet, we took that decision and we just deferred it off the end of the world and then, when somebody needed it, we shimmed it in in a day. Because our architecture had done something right. What is the hallmark of a really good architecture? A good architecture allows major decisions to be deferred.”

“A good architecture maximizes the number of decisions not made.”

At 1:00:50, he says,

“How do you keep the beast under control? You need a suite of tests you trust with your life. You must never look at that suite of tests and think ‘you know? I don’t think I really tested everything?’ As soon as you think that, you’ve lost it. Because now you’re afraid of your code. The reason we write our tests first is so that we know, that every single line of code we wrote was because of a failing test that we wrote. So that we know that every single decision that we made is tested. So that then, we can pull up that code on our screen and say ‘Oh my God, that looks like a mess’—and clean it!…without any fear.”

Great talk. Add it to the pile of things that we know—or should know—better, but don’t.

Woefully unqualified "programmers"

2023-10-11T21:15:00+02:00

Published by marco on 11. Oct 2023 21:15:00 (GMT-5)

Updated by marco on 6. Mar 2024 07:35:06 (GMT-5)

As I was reading the absolute train wreck of a unit test in Testing with a Lisp (Daily WTF), the song “What the fuck is going on?” popped into my head, like it always does when I see that a programmer not only didn’t understand the assignment, not only doesn’t know how to program, but also doesn’t know that they don’t know how to program.

They are living their best life because they don’t think that “knowing how to program” is required in order to be a programmer. Neither does their boss or team, I guess.

That’s when the music starts to play in my head, and I think of little blind Dillon playing football because a very non-PC friend [1] sent me that video so many years ago.

Am I going to link the video? Of course I am. Because I’m a terrible person. [2]

BLIND KID PLAYING FOOTBALL WHAT THE FUCK IS GOING ON by RelentingPeach (YouTube)

And this is the test from the article above.

test("Returned objects arguments immutable (a b)", function() {
  var result = lispParser("(a b)");
  expect(3);
  ok(typeof(result) === 'object', "result is an object");
  var children = result.arguments;
  var newValue = 2;
  var firstChild = children[0];
  if (children[0] == newValue) {
    firstChild = ++newValue;
  }
  notEqual(result.arguments[0], newValue, "Underlying array was immutable");
  equal(result.arguments[0], firstChild, "Underlying array was immutable");
});

Highlight anything you think seems stupidNothing about that test makes any sense. It will always pass. It is, in its own way, a work of art. It is the JavaScript equivalent of Chomsky’s Colorless green ideas sleep furiously (Wikipedia), an example of a sentence that is “grammatically well-formed, but semantically nonsensical”.

Honestly, this looks worse than anything I’ve seen my students try to write. They usually have enough shame that they don’t bother filling in an answer if they really have no idea what’s going on.

I’m also wondering, of course, whether this is the work of an AI—or the bastard child of a poseur-programmer and an AI. The future is bright.

[1] All kidding aside, I love this kid’s confidence. It’s off the charts. I hope karma was gentle with him.

[2] It was the same person who sent me the little cartoon I’ve included.

Technology-independent software-development courses

2023-10-05T13:48:10+02:00

Published by marco on 5. Oct 2023 13:48:10 (GMT-5)

Updated by marco on 10. Oct 2023 06:22:28 (GMT-5)

I was recently asked something like the following question, which I am citing with a few minor edits.

We would like to do a course about SW development with Python, preferably an online course, so that we can start at our own pace.

We don’t want a Python course, but would instead like a course more about SW development. It would be great if it were in Python because we are comfortable with it.

The interesting topics would be:

object-oriented programming

functional programming

design patterns

good coding practices
As well as other important topics such as:

Testing

Documenting

Version control

Working in a team with version control
The course doesn’t have to contain all these topics. It can be also several courses or it can be toy-projects from somewhere.

Learning how to develop software

I have very little familiarity with courses as I’ve usually been tasked with figuring out how to do things before others have gotten to it. Of late, I’ve been teaching courses, not taking them.

So, how did I learn what I know about software development? When I started writing software, there was nothing available online, outside of a bunch of GeoCities pages (one of which was mine). MSDN was on CDs or local help files.

I read some books, OOSC and OOSC2, as well as the Gang of Four’s Design Patterns. I can’t remember what else, but that’s partly how I leveled up my skills. I had the great fortune of being able to build and work on large frameworks, from which I drew many lessons. I worked with very good people, who challenged me and taught me a lot.

Nowadays, I use DuckDuckGo as my online reference. I have developed a relatively advanced skill at searching for what I’m looking for. I very often get it within minutes. I almost never use videos.

a primary skill in software development is to be able to imagine what you should be looking for. That is, you don’t have to know how to do everything without looking it up, but you do have to imagine that it might exist.

For example, I don’t know how to write automated tests in Python, but I know that it should be possible. I know that I should figure that out very early in my experiments with Python. I know what to expect from an automated-testing environment. I know which settings to look for and expect.

That kind of knowledge transfers from one language or development environment to another. I know that I code-completion makes me faster, I know that I would like to avoid runtime errors—how can I best use Python to achieve those ends?

Online Courses

I took a quick look around for online courses, but was not immediately convinced that I am equipped to be able to distinguish between scams and actually worthwhile courses. Does the course even mention general software-development principles? How much time is allocated to that?

Udemy

The Complete Software Engineering Course with Python (Udemy) looks as follows:

Udemy Python Software Course

Only about 1% of students even bothered to rate the course
The pricing fees high pressure and scammy
The course descriptions are barely in English

What about general programming?

Udemy Software Engineering Lecture: 9.5 minutes

Just over nine minutes? And you can’t even be bothered to describe it in something approaching well-written English? No, thanks.

PluralSight

The course Learning To Program − Part 2: Abstractions (PluralSight) looks a bit more professional, but it still has some quirks (especially for $29 per month).

PluralSight Details

Overall more professional than Udemy, but also seems to have been incompletely translated from French
Still, there’s definitely more time dedicated to core concepts

There is an assessment that you can take, but you have to sign up first.

PluralSight Assessment

Maybe PluralSight is able to tell you which courses you need, but I doubt it will err on the “you need fewer courses” side.

Dometrain

Update on 2023-10-06: added the section below

I’ve recently heard from a source I’ve been watching for a while that this course is quite good for C# developers: From Zero to Hero: Test-Driven Development in C# by Guilherme Ferreira. The person recommending it releases quite interesting/advanced videos on YouTube and has his own range of courses at DomeTrain.

How should software-development be taught?

How would I teach basic software-development principles? I would probably start with very abstract principles that try to answer the classic questions for “use cases”:

Requirements: What are you trying to achieve?
Actors: Who is trying to achieve it?
Stakeholders: Are there other points of view than just the actor’s?

Which language?

A question people tend to start with is: which programming language should I use?

That’s the wrong question.

The applicability of programming languages to fields differ widely, but most languages have a large overlap in functionality. Where they differ is in the degree of runtime or library support for specific tasks.

For example, Python famously has a lot of libraries for number-crunching and data-analysis (although I feel that this advantage is grossly exaggerated) whereas it’s terrible for writing Windows GUI applications. C#/.NET has excellent web and desktop technology support. The Python runtime is notoriously slow (with essential libraries written in C++) whereas .NET is known as a very performant cross-platform runtime.

Do you see how quickly the conversation turns from “what can the language do?” to “what can the standard runtime/libraries/environment do?” That’s because you can do most tasks with most languages.

Instead, we want to think about this at a higher level. We want to,

maximize useful expressiveness while minimizing harmful expressiveness.
accommodate inherent complexity without introducing accidental complexity.
express our intent explicitly in our programs.
be able to discover and eliminate assumptions
get compilation errors or warnings, not runtime errors.

Developer discipline

Programming languages exist on several spectra. One of these is “the degree of developer discipline required to use the language effectively and safely.”

What does that mean? For example, Python and JavaScript have a dynamic type system. While there are mechanisms, practices, and IDE support that you can use to set up guardrails missing in the language, but they are optional and Idiomatically written code in both of these languages tends not to use any of it. It’s the wild west, for the most part, with a lot of assumptions that nothing will ever go wrong.

More strict languages force you to consider all possibilities before your program even compiles or runs. For example, Haskell and Rust are famously picky. If you have a function that returns a value under certain conditions, those languages will make you explicitly indicate what to return when those conditions don’t hold. Forgiving languages will just use some default value, usually null or undefined.

This is called “happy path” programming because you only write the code for the hoped-for path through your use case. For example, the user selects a valid file with the expected data format with an acceptable length with no validation or processing errors, generating a data file to which the initiating user has access.

Did the user click cancel? Not handled.
Was the file missing? Not handled.
Was the file in an unexpected encoding? Not handled.
Did the file fail to validate? Not handled.
Was the data empty? Not handled.
Did processing crash? Not handled.
Was the generated output not accessible to the user? Not handled.

Writing programs in this fashion is a dangerous thing to do with a strict language, and it’s even worse to do in a lax language.

Even the simplest software has many, many branches. The less your language or compiler or IDE reminds you of them, the more you have to fill that gap with developer discipline.

Important language/runtime/IDE features

To get more concrete, some good questions to consider are:

Can you clearly describe and use types? (implicitly typed ≥ explicitly typed ≥ dynamically typed)
Are types statically checked? (yes > no)
Can you primarily work with early binding? E.g., how strongly idiomatic are virtual/late bindings?
Are data and operations idiomatically merged or separate? Is there support for pure data structures (e.g. records vs. classes)
What about discriminated unions? Range types? Can you avoid primitive obsession?
Can you declare non-nullable references?
Can you designate functions as pure?
Is a functional programming style supported?
Is there a way to make data or records immutable?
To what degree can you optimize performance where needed?
How does error-handling work?
How concise can you be? What kind of abstraction mechanisms are there? Do you have to write a lot of boilerplate?
How does I/O work? Is it a second-class citizen? (I.e., does the language enforce purity so hard that it make it a pain to read from a file? I’m looking at you, Haskell and Elm)
What is the asynchronous programming model?
How good are the error messages?

If these don’t make any sense to you, don’t worry. But they are questions that are important when you’re choosing a tool for building software.

Intent & handling events

The whole point of a programming language is to express intent. You indicate what you intend to happen when a given event occurs.

An programmer expresses an intent by writing that, “when this thing happens, I intend for this other thing to happen.”

For example,

When a filename is provided on the command line, read the contents of the file in a given encoding, process it as lines of text, and save the results to another file.
When the user clicks the screen in a particular place when the program is in a particular state (e.g., over a button displayed in a dialog box), then execute an operation.
When an HTTP request arrives, then read the body, process it, and return an HTTP response with the results
When a sensor triggers an interrupt, then change a GUI readout from red to green
When the system sends a shutdown message, then close and flush all logs

Questions to consider

How do we choose a programming language? You’re not just choosing a programming language, you’re also implicitly deciding which subset of language features to use. This is predicated, of course, on knowing about these features. It’s best to inform yourself about what your language/libraries/runtime (let’s call it a software-development tool) can do for you—or find someone who is well-informed to help.

For each feature, you should ask yourself: how useful is it? Does it help me achieve my task?

Let’s take a look at high-level features of a software-development tool that may be important.

Maintainability

By whom? What level of programmer?
How much will a certain part need to change?

Comprehensibility

By whom? What level of programmer?
Are you applying the “rule of least power”? That is, are you using the simplest, most effective tool in your arsenal? To know this, you have to expand your arsenal … and then use it judiciously
Low syntax noise

Discoverability

Can you use the IDE and code-completion to learn the API?
How much documentation do you need?
How much on-boarding would a new developer need?
How easy would it be to hand off to another person or team?

Learnability

Even if not especially discoverable, how learnable is the code?
Are the patterns consistent?
Is the API clear?

Strictness / Correctness

Error-free is strongly encouraged or enforced

Complexity

Inherent vs. Accidental

Idiomatic

Is it written in a way that other programmers of this language or framework can easily understand?

Testability

Can you write semantically useful tests?
Is it easier to write a test than to debug?

Debuggability

Can your debugger set a breakpoint on all of the important bits?
Can you even debug it?
Is there too much magic?
Is there so much generalization that you can’t figure out what’s going on?

Observability

Is there sufficient logging to figure out what’s going on without debugging? (This is important once the product is in the field.)
What about error-handling? Are problems separated from errors? [1]
Can the developer predict what’s happening and measure it?
How’s your telemetry?
Could the software be monitored if needed?

Performance

Does it need to be fast?
All of it? Or just parts?

For code designed to be reusable (libraries, frameworks), you can also consider:

Completeness: degree to which the definition/API captures all facets of the problem domain
Expressiveness: Concise and precise
Flexibility: Applicability to different problem domains

Which of the features above matters more depends on what you’re building. A one-off script doesn’t need to satisfy many of these features. A full-blown application that needs to be maintained for 10-20 years by different teams has to be much, much more careful.

Encodo White papers archive

These white papers were written from 2006 to 2019 when I was still employed at Encodo Systems AG. They expand on recommended practices of specific facets of software development. They are presented in reverse-chronological order, but can be read in any order.

DI, IOC and Containers (2019)
Clean and Safe Code (2019)
Testing as First-Class Citizen (2019)
Code Reviews (2017)
Document Everything (2017)
Continuous Integration and Delivery (2017)
Component-based Design (2006)
Design by Contract (2006)
Test-driven Development (2006)

Videos

This is a YouTube playlist I’ve maintained for years that I continuously update whenever I watch a video that I think would be interesting for other developers. It’s only technology videos, but it’s pretty eclectic (i.e., it’s language- and technology-agnostic).

Developer suggestions (YouTube)

Conclusion

Pace yourself. You can’t have everything all at once. Programming takes wisdom. Wisdom takes time. It takes practice. It comes, or it doesn’t. It takes different forms.

As Rainer Maria Rilke wrote in 1903 [2],

“Forschen Sie jetzt nicht nach den Antworten, die Ihnen nicht gegeben werden können, weil Sie sie nicht leben könnten. Und es handelt sich darum, alles zu leben. Leben Sie jetzt die Fragen. Vielleicht leben Sie dann allmählich, ohne es zu merken, eines fernen Tages in die Antwort hinein.”

Good luck.

[1] See The Error Model by Joe Duffy.

[2] h/t to Ömer for making me aware of this great piece of writing.

To crash or not to crash; that is the question

2023-10-04T21:54:06+02:00

Published by marco on 4. Oct 2023 21:54:06 (GMT-5)

Note: I found this old draft containing my response to a colleague.

I 100% agree with you, in general. I absolutely want to know immediately when an assumption I’ve made does not hold.

But…😁

The degree to which I’m willing to crash depends on whose consistency I’m basing my assumptions on. When I call a method in my code from another method in my code, I’m absolutely going to assert that an argument is not null. I can control that. My IDE will tell me when I might be passing null. That is definitely a programming error.

When I’m getting external input (e.g. from the Windows registry), I’m a bit more cautious because I’m less sure about how solid my assumption is. I know what the documentation says but a lifetime of programming has taught me that some things (like the Windows registry) are going to work exactly as expected on my (modern) developer machine, but are going to fail mysteriously on a (perhaps less modern) machine in (for me) completely unpredictable ways.

Therefore, I’m a bit careful about is what I’m willing to pay to find errors. The primary purpose of a program is to bring value to the customer/user. I want to improve my program for more situations, but how am I going to find out in which situations it doesn’t work?

I can test, of course, but some things will only ever happen in the field. If it happens in the field, then I’m using the customer’s/user’s time to help me fix my program (they benefit, of course, but not for free). Can I soften the blow to the user of having to help me improve the program without sacrificing consistency or accuracy?

Sometimes, the answer is a resounding no. The program absolutely cannot continue if e.g., the reference to the data it needs to work on is null. That’s a no-go. There’s no rescuing the program from that or completing any other useful work.

In the case of this tool, if it crashes, the user no longer gets a report. Would they have been able to get some of the report if it hadn’t crashed? In this case, yes. All of the other checks could be run. The checks that crashed would show as “failed” with the exception message. That seems to me to be better than skipping all subsequent checks when one crashes.

I can even continue to hope that the user then reports the mysterious error message they got for one of the reports! Die Hoffnung stirbt zuletzt!

I’m delighted to discuss programming and error-handling philosophy in person next week!

Encodo White Papers Archive

2023-10-04T21:36:27+02:00

Published by marco on 4. Oct 2023 21:36:27 (GMT-5)

This article is a copy of the white papers and process description that I wrote for Encodo Systems while I still worked there. I’ve preserved a copy of it here and in the linked articles.

Through our many years of experience building software, we’ve accumulated methodologies and principles that lead to quality software.

The experience and know-how we bring to our consulting services also make us excellent partners in development-process consulting.
We have experience in reviewing existing software and proposing changes to improve its reliability and stability.
Not only can we build great software for you, we can help your development teams improve their current software and teach them how to write better software in the future.

Listed below are our methodologies.

DI, IOC and Containers (2019): Applications are graphs of components, each with dependencies and inputs and outputs. DI and IOC are a simple and powerful way of building testable, flexible and easily replaced components.
Clean and Safe Code (2019): A large part of writing good, clean code is to restrict your options. Avoid nullable data. Avoid mutable data. Avoid state. Code is not a poem: it’s actually preferable if it all looks the same and seems “easy” and “obvious”.
Testing as First-Class Citizen (2019): Use tests to execute your code. Automate them to guard against future changes. Make testing easy for developers by building nice integration suites. Tests are code.
Code Reviews (2017): Finished code is reviewed before it is committed. Reviews make sure that the other parts of the design process were followed (documentation, tests, etc.) and that errors are fixed as quickly as possible.
Document Everything (2017): Software without documentation is not maintainable. Good naming conventions, logical designs and well-built tests help make code self-documenting. High-level tutorials explain how the different components are intended to work together.
Continuous Integration and Delivery (2017): Low-overhead releases lead to faster turnaround and higher quality. Automated (or nightly) builds pinpoint software errors before they can become problems.
Component-based Design (2006): Components encapsulate specifications. Software is designed to balance reuse and abstraction against the project goals, all the while optimizing maintainability.
Design by Contract (2006): The specifications are built right into the code. Software publishes its assumptions instead of leaving its contract implicit.
Test-driven Development (2006): Tests are a central part of building software. All component features have application-independent tests which can be run automatically and throughout development.

Encodo White Papers: DI, IOC and Containers (2019)

2023-10-04T21:36:20+02:00

Published by marco on 4. Oct 2023 21:36:20 (GMT-5)

Encodo keeps the SOLID principles in mind when designing software.

DI & IOC

We implement the Inversion of Control [I] pattern with the dependency-injection pattern (D) to allow for a large amount of flexibility in how an application is composed. We’ve applied this principle throughout the Quino framework and use it in our products as well.

What does this mean? It means that the product or framework doesn’t make any decisions about which exact components to use. Instead, it indicates the API Surface (interface) that it expects in the form of injected components. That is, the responsibility for deciding which component to use lies not with the lowest level of the software stack, but with the highest level.

This inversion means that the application entry point configures the object graph (i.e. which objects will be used). That makes it much easier to isolate and test individual components, especially where those components would depend on native- or web-only functionality in production.

See the How do I DI? presentation from February 2018 for more information.

Components

An application is a graph of components, each with one responsibility (S) and zero or more dependencies, injected via the constructor. Components are composed with other components to build higher-level functionality (O). They are also unaware of the other components’ implementations and can be replaced with other implementations (L).

Components make software flexible:

Products can replace any component without changing anything else
Products can inject any component without pulling in more functionality than needed

Declaration & Implementation

Components have a very clear purpose (S) indicated through an interface. In most cases, we use an actual “interface” language construct to clearly define the API surface and to not limit a product in its implementation (e.g. with an abstract base class).

Prefer composition to inheritance, exposing clear dependencies
Reference dependencies via interface (or type or protocol, depending on language) with as small a surface area as possible
Obtain dependencies through injection, preferably in the constructor

Most components have a single method, amounting to a functional interface and allowing composition with lambdas. While TypeScript has this feature (as does Java), C# does not. We end up defining a lot of single-method classes that implement a single interface. It’s more code than we’d like, but it’s purely structural syntax and doesn’t introduce additional complexity.

See the Interfaces, base classes and virtual methods in the Quino conceptual documentation for more information and on and examples of patterns that we use.

Containers

Although it’s possible for applications to manually create an object graph (the composition root), we prefer to use an IOC Container.

The container provides two services:

Registration: Applications declare the object or type and the lifetime (generally singleton) to use for interfaces
Requests: Applications request objects, which the IOC creates—injecting other registered objects, as necessary—or retrieves, depending on lifetime. A container can create transient objects even for unregistered types.

The container introduces the following restriction:

A concrete type may have many constructors, but only one may be public

The lifetime of an application is as follows:

Collection registrations in the IOC Container
Create composition root with the IOC Container
Apply method to composition root

See the Quino Application Configuration for more information about application lifecycle. The blog article Starting up an application, in detail is a bit older, but provides more detail on how Quino integrates the IOC into the startup.

In the long example below, we will first look at how composition even without a container is very powerful. Then we’ll look at how a container can improve on that.

Example

Although we generally use C# or TypeScript in our work, these examples were originally written to introduce Swift developers to an iOS framework that we wrote.

Step One: A limited robot simulator

Let’s take a look at an example of an application that looks OK at first, but turns out not to be very flexible.

Note: The example is small, so some of the steps will feel like over-engineering. It’s a good point, but the principles shown here apply just as well for larger systems.

The following example defines a simulator that can move a robot along a route, defined by movements. The robot starts at a given location and can travel at a fixed speed.

enum Direction
{
  case north
  case south
  case east
  case west
}

struct Movement
{
  let direction: Direction
  let distance: Int
}

struct Point
{
  var x: Int
  var y: Int
}

class FastRobot
{
  var speed = 2
  var location: Point = Point(x: 0, y: 0)
  let movements: [Movement] = [Movement(direction: .north, distance: 1)]

  func move()
  {
    for movement in movements
    {
      let distance = speed * movement.distance
      switch (movement.direction)
      {
      case .north:
        location.y += distance
      case .south:
        location.y -= distance
      case .east:
        location.x += distance
      case .west:
        location.x -= distance
      }
    }
  }
}

class Simulator
{
  func run()
  {
    FastRobot().move()
  }
}

As mentioned above, this implementation looks well-written, but what if we wanted to verify that the robot ended up at the right location? Let’s try that below.

Step Two: Running the limited robot

Simulator().run()

// Now what?

It turns out that we can’t test anything in this application. We can fix this by applying the patterns outlined in the first section.

Step Three: Decouple the robot from the simulator

First, let’s tackle the Simulator interface:

class Simulator
{
  func run(robot: FastRobot)
  {
    robot.move()
  }
}

let robot = FastRobot()
Simulator().run(robot: robot)

XCTAssertEqual(robot.location.x, 0)
XCTAssertEqual(robot.location.y, 2)

Now we can test that the robot is working as expected.

The robot is still quite hard-coded, as is the simulator’s relationship to the robot. The robot must be a FastRobot and it can only move along a fixed route.

Step Four: Reduce the robot “surface”

We’ll first decouple the Simulator from a direct dependence on the FastRobot.

protocol IRobot
{
  func move()
}

class Robot : IRobot
{
  // As above
}

class Simulator
{
  func run(robot: IRobot)
  {
    robot.move()
  }
}

Now the simulator only knows about the protocol IRobot, which has a very small surface area. It’s still too small to be very useful.

Step Five: Make the robot configurable

Instead of hard-coding everything, we can compose the robot out of parts. Examining the algorithm, we see three parts that could be externalized:

The robot’s speed is currently fixed. We could make a component that is responsible for calculating the speed of the robot.
The robot’s route is also fixed. We could make a component to represent the route as well.
Finally, the robot’s initial position is also fixed. We could make that configurable as well.

Let’s first externalize all of the hard-coded values out of the FastRobot into a generic Robot class.

class Robot : IRobot
{
  let speed: Int
  var location: Point
  let movements: [Movement]

  init(speed: Int, location: Point, movements: [Movement])
  {
    self.speed = speed
    self.location = location
    self.movements = movements
  }

  func move()
  {
    for movement in movements
    {
      let distance = speed * movement.distance
      switch (movement.direction)
      {
      case .north:
        location.y += distance
      case .south:
        location.y -= distance
      case .east:
        location.x += distance
      case .west:
        location.x -= distance
      }
    }
  }
}

Now we can create a Robot, injecting all of the initial conditions.

let origin = Point(x: 0, y: 0)
let route = [Movement(direction: .north, distance: 1)]
let robot = Robot(speed: 2, location: origin, movements: route)

Simulator().run(robot: robot)

XCTAssertEqual(robot.location.x, 0)
XCTAssertEqual(robot.location.y, 2)

The same assertions hold as before, but the Robot class is much more generalized. We can now test the robot’s movement algorithm with various combinations of origin, speed and route.

At this point, we’ve made the robot and simulator composable and testable. Now we want to have a look at how we can separate the configuration from the usage.

Using a container to build objects

We’re not nearly done, though. What does this all have to do with a service provider? That’s where the inversion part comes in.

In the very first example, the Simulator was responsible for creating the robot. This made it impossible to test whether the robot did what it was supposed to do.

So we passed the robot in as a parameter to run(), making the caller responsible for creating the robot instead of the Simulator.

This is fine, as long as the caller is the top-level part of the program, responsible for composing the objects that will be used. However, what if the direct caller doesn’t know how to do that? Or, put another way, what if the caller should not be doing that?

What if the caller is a button handler in a UI? Would we want the button handler—or the UI that contains it—to be responsible for constructing the robot or its initial conditions?

This is where the container comes in: we want to register all of the types and instances that we want to use in one place. This configuration can be retrieved at any later point without knowing any more than the interface that’s required.

This takes us full circle to the original code, except, instead of creating the Simulator directly, we want to get it from a container, called a provider in the following examples.

let simulator = provider.resolve(ISimulator.self)

simulator.run()

let robot = provider.resolve(IRobot.self)

XCTAssertEqual(robot.location.x, 0)
XCTAssertEqual(robot.location.y, 2)

Note: For reasons of simplicity, we assume that all objects in the container are singletons.

Step Six: Configure the container

Let’s take the configurable code above and translate it to a container. Here the registrar is the configurable part and the provider is the part that can be used to retrieve objects based on that configuration. The registrar is sometimes called the composition root.

Note: We use the syntax for the Swift IOC, but the examples are hopefully clear enough in their intent.

In the example below, we register singletons for each of the objects we want the container to be able to create, Point, Int, [Movement], IRobot and Simulator.

let registrar = ServiceRegistrar()
  .registerSingle(Int.class) { _ in 2 }
  .registerSingle(Point.class) { _ in Point(x: 0, y: 0) }
  .registerSingle([Movement].class) { _ in [Movement(direction: .north, distance: 1)] }
  .registerSingle(IRobot.class) { p in Robot(
    speed: p.resolve(Int.class), 
    location: p.resolve(Point.class),
    movements: p.resolve([Movement].class)
  )}
  .registerSingle(Simulator.class) {p in Simulator(p.resolve(IRobot.class))}

This is a decent start, but many of the registrations above have no semantic meaning, like Int and Point and [Movement]. For these, it’s better to use higher-level abstractions.

Step Seven: using higher-level abstractions

We need to define three abstractions—called IOrigin, IRoute and IEngine—with implementations. The IRobot interface also needs to be redesigned to use them.

protocol IRoute
{
  var movements: [Movement] { get }
}

protocol IOrigin
{
  var point: Point { get }
}

protocol IEngine
{
  var speed: Int { get }
}

protocol ISimulator
{
  func run()
}

class Simulator : ISimulator
{
  var robot: IRobot

  init (_ robot: IRobot)
  {
    self.robot = robot
  }

  func run()
  {
    robot.move()
  }
}

struct StandardRoute : IRoute
{
  var movements: [Movement] = [Movement(direction: .north, distance: 1)]
}

struct StandardOrigin: IOrigin
{
  var point: Point = Point(x: 0, y: 0)
}

struct FastEngine : IEngine
{
  var speed: Int = 2
}

class Robot : IRobot
{
  var location: Point!
  let engine: IEngine
  let route: IRoute

  init(_ engine: IEngine, _ origin: IOrigin, _ route: IRoute)
  {
    self.engine = engine
    self.route = route

    location = origin.point
  }

  func move()
  {
    for movement in route.movements
    {
      let distance = engine.speed * movement.distance
      switch (movement.direction)
      {
      case .north:
        location.y += distance
      case .south:
        location.y -= distance
      case .east:
        location.x += distance
      case .west:
        location.x -= distance
      }
    }
  }
}

We’ve created concrete objects for our standard parameters. An added bonus of the improved semantics is that we can rewrite the init for IRobot so that it no longer expects argument labels—because the parameter are now clear without further explanation.

Now we can take another crack at the configuration using these new types. This time, we’ll define an extension of the IServiceRegistrar that we can use again below.

extension IServiceRegistrar
{
  func useSimulator() -> IServiceRegistrar
  {
    return self
      .registerSingle(IEngine.class) { _ in FastEngine() }
      .registerSingle(IOrigin.class) { _ in StandardOrigin() }
      .registerSingle(IRoute.class) { _ in StandardRoute() }
      .registerSingle(IRobot.class) { p in Robot(
        p.resolve(IEngine.class), 
        p.resolve(IOrigin.class), 
        p.resolve(IRoute.class)
      )}
      .registerSingle(ISimulator.class) {p in Simulator(p.resolve(IRobot.class))}
  }
}

We’ve now configured a system that knows how to create our simulator along with all of its dependencies. You can see that if the ISimulator type is resolved from the container, it will,

create a Simulator, which
resolves the IRobot, which
resolves the IEngine, IOrigin and IRoute

Step Eight: Changing the speed

An application can now change the speed of the robot without knowing anything else about the simulator, simply by changing the IEngine that’s used.

class SlowEngine : IEngine
{
  var speed: Int = 1
}

let provider = ServiceRegistrar()
  .useSimulator()
  .registerSingle(IEngine.class) { _ in SlowEngine() }
  .commit()

As well, any location in the application can either use the IRobot or the ISimulator without having to know anything about how either of the concrete objects are constructed. The simulator might be much more complicated than the very simple one defined above. The robot might do much more when asked to move.

Step Nine: Using a factory

What if we wanted to let the robot decide how fast it is, depending on what kind of robot it is? Or what if we want to separate the speed from being fixed in the IEngine?

What we need is a way to create transient objects that require parameters that are not available in the provider. These are types like Int, String, etc., as we had in Step Six above.

The example below shows a very simple usage of the factory pattern. Instead of having a single IEngine for the whole application, we want to provide settings that the robot uses to get its engine.

The code below sketches the new types and shows how the robot would use them.

protocol IEngineFactory
{
  func createEngine(speed: Int)
}

protocol IRobotSettings
{
  var speed: Int
}

class Robot : IRobot
{
  init(_ engineFactory: IEngineFactory, _ settings: IRobotSettings, _ origin: IOrigin, _ route: IRoute)
  {
    self.engine = _engineFactory.createEngine(settings.speed)

    // …
  }
}

You’ll note that we didn’t declare any new properties. The robot still just has an engine, but asks the factory to create it based on a speed, rather than having the provider inject its singleton.

The robot’s speed can now be configured without replacing the entire implementation.

let simulator = provider.resolve(ISimulator.self)
let robot = provider.resolve(IRobot.self)
let settings = provider.resolve(IRobotSettings.self)

settings.speed = 10;

simulator.run()

XCTAssertEqual(robot.location.x, 0)
XCTAssertEqual(robot.location.y, 10)

Encodo White Papers: Clean and Safe Code (2019)

2023-10-04T21:36:13+02:00

Published by marco on 4. Oct 2023 21:36:13 (GMT-5)

These are the two core principles that guide how we write code:

KISS: Keep It Simple, Stupid
YAGNI: You Ain’t Gonna Need It

KISS

This first principle is a constant reminder to ourselves to avoid the seductive call of cleverness. Most code does not need to be clever. Very occasionally, it is necessary to implement something with real flair, that requires explanation.

The best code, though, requires no explanation. The best code gets its job done in a very boring way, using the same patterns to achieve different ends. The best code is instantly recognizable to those who know the patterns. The best code doesn’t raise any questions. The best code doesn’t need comments. The best code is obvious and, yet, does amazing things—like fulfill requirements in a stable, predictable, testable, customizable and high-performance manner.

It’s kind of obvious: The lower the complexity, the easier it is to reason about systems. The easier it is to reason about a system, the easier it is to prove that either certain things can’t happen or will always happen. It should be obvious where to add a customization—because there’s only one place that it could logically go. It should be obvious where a bug lies—because there’s only one place it could have originated.

The best code is readable and understandable not only by the original programmer, but also by another programmer—even if that’s the original programmer, six months later.

YAGNI

We’d be lying if we said that we never write code that we don’t need, but we keep this principle in mind whenever we build code. There’s a bit more wiggle room when building frameworks vs. products. It’s easier to determine whether a feature is appropriate for a product than to do the same for a framework. Who knows how a framework might be used?

Encodo does have a framework named Quino. The point of a framework is to support the development of products that use it. It’s not easy to predict what those products might need, even when you’re focused only on features that your framework is supposed to provide. However, a framework or library has a purpose and it shouldn’t stray from it.

Just as an example: Does Quino provide a remote data driver? Yes, because products have used it and the feature fits into the strategy of metadata-supported data. Is there an XML transport protocol? No, because no-one needed it. Do we support any kind of object? Not out of the box, we don’t. You can register your own converters, but it’s not a generalized protocol.

At the very least, we stay away from throwing in everything but the kitchen sink—just in case a product that uses Quino might need it. Be prepared for anything, but build only what you need.

Other Principles

We apply the following principles to avoid unneeded complexity.

Separate state from logic
Use immutable data
Use non-nullable references
Avoid side-effects
Compose functional components
Use singletons

Separate state from logic

From the article Why OO Sucks by Joe Armstrong (inventor of Erlang).

“State is the root of all evil. In particular, functions with side effects should be avoided.”

The sentiment in the title is a bit strong, but its not unfair. OO programming mixes data with operations, leading to more complexity than required by the task.

Most applications need some state. That state should be isolated from most components. State should be stored in dumb objects and passed around.

A component without state is purely functional, drastically simplifying the things that could possibly happen to it. Its output is completely determined by its inputs. It does not introduce any threading issues beyond those inherent in its input.

Use immutable data

A component avoids a whole class of issues if it cannot make changes to the data that flows through it. As with state, restrict mutability to only certain components.

For example, transient objects like DTOs or ORM objects are mutable because it makes the program logic much more understandable

Another example is stateless singletons with configuration settings. instead of using a single component with mutable properties, define the configuration in a settings component. This has several advantages:

The settings is a dumb “state” object (single responsibility)
The service is a stateless singleton (single responsibility)
A product can replace the service independently of the settings (and vice versa)
Service implementations don’t repeat boilerplate code to manage properties

Use non-nullable references

If references are guaranteed to be non-null, whole swaths of checking code fall away and make the component much simpler. As with immutability, there are far fewer possibilities of what can happen to non-nullable code.

TypeScript supports a null-checking mode. C# supports one as well, starting with C#8. For older versions of C#, use the JetBrains Annotations along with ReSharper to enable real-time/compile-time null-checking.

Avoid side-effects

A method should either change state or it should return data. This is the idea behind CQRS (Command-Query-Separation Principle). That said, we employ a weaker version where only visible state really counts.

Techniques like lazy-initialization and caching retrieved data are generally OK. Technically, those behaviors have non-visible state in the sense that they affect performance, but are still OK if used carefully.

Compose functional components

We use C# and TypeScript—wonderful OO languages with strong functional support—but we’re using less and less of what OO has to offer.

Virtual methods are a code smell. Instead, use smaller, testable components with a single purpose. If it’s easier to test, it’s easier to replace where necessary. Smaller components are more focused and easier to replace without duplicating code.

If logic is separated from data, and services are injected or passed as parameters, then there is less and less need for base classes with many helper functions or virtual/protected methods.

Use singletons

If state just flows through a component, then that component can be a singleton, avoiding needless allocation.

It’s a lot easier to reason about an application that comprises a graph of singletons with transient data flowing through it.

Inject factories to create transient services (e.g. a remote-method caller that captures state).

Conclusion

As you can see, we put a lot of thought and care into our development practices and patterns. We try really hard to work in a way that ends up with quality software: stable, maintainable, extensible, testable and, most importantly, does what it’s supposed to.

For more information about specific development patterns, please see the architecture section of the Quino conceptual documentation. There are sections on interfaces, base classes and virtual methods, providers, tools & toolkits, task-specific interfaces and much more.

Encodo White Papers: Testing as first-class citizen (2019)

2023-10-04T21:36:07+02:00

Published by marco on 4. Oct 2023 21:36:07 (GMT-5)

Tests are code. Writing tests is not a “step”—it is part of writing the code itself. The component is nothing without its tests.

It should be easy to verify any requirement with a test. The tests should tell the story of the requirements.

A developer can test any component in isolation (unit testing) or can test the component in the constellation in which it normally exists (integration testing).

Tests are code

Just so we’ve said it: tests are not a place to use a different coding style or different coding practices than in “regular” code. Choose your frameworks wisely. It should be easy to write powerful, elegant and easily understood tests. Build your own support code and libraries where needed. Apply the same coding principles as you would with the code being tested. You have to maintain testing code just like any other code.

We discuss below that we prefer integration tests to unit tests—that only works if you provide a way to write high-performance integrated tests without repeating a lot of code.

Unit Tests and Mocking

Unit tests are very easy to write for properly written components. With a proper infrastructure, such tests can just as easily be executed in an integrated environment. In such cases, there is generally no need to invest time (and incur maintenance debt) writing two sets of tests.

Automated tests will sometimes replace components and dependencies with fake or mocked objects, in order to isolate and test only a component’s logic without incurring the costs of configuring and setting up unrelated components.

If integration testing is too complicated or too slow, then a web of unit tests may suffice. In most cases, though, this doesn’t apply and we avoid mocking entirely and test components directly in common, integrated settings.

For example, if a component is commonly used as part of a database-based application, then it is more effective to test that component in such a scenario, rather than expending effort in isolating the component in order to have a “true” unit test.

With only unit tests, there is a danger that the component works, but only as tested, not as actually used.

Testing configuration

Often, these problems arise in component configuration. A unit test will pass in carefully prepared (and sometimes faked) dependencies and run all-green.

However, an integration test will check that the configuration code also works. That is, that the component is configured correctly for products that use it and not just in the tests that verify its behavior.

Mocks and fakes must be used judiciously, otherwise you end up either testing only the mock or you end up hiding certain classes of problems, as discussed in more detail below.

Missed opportunities

Imagine a UI list that validates and saves entries when the focus changes. This list might work just fine in a test, where notifications and side-effects as a result of saving are disabled with mocks.

This is no longer the real-world situation, though. What happens if one of the notifications would have led to a reload of the list or a state-change in one or more objects? What if the list only saves an object it is is marked as “changed” but that the spurious event resets that status in integration? This kind of interaction—this kind of bug—represents exactly the kind of thing we would miss when testing the list in too isolated a manner.

Because we’ve mocked away too much—because we focused too tightly on a unit test of the list—we’ve missed a bug that will come up in production instead.

Writing tests

While we don’t practice strict TDD at Encodo, we do write tests from the very beginning.

It’s really the only way to test the code that you’re writing, isn’t it? What are you going to do instead? Fire up the web server each time you want to throw data at a controller? Use a browser or Postman to fire those requests? Or are you starting a desktop UI and clicking around and typing? Or did you hack together a little console application in order to debug code?

Stop doing all of those things. Use a testing environment instead, so your product acquires a growing stable of automated, repeatable regression tests. It will become second nature to write tests to verify requirements about the components you write.

As we said above: the tests are part of the component.

Coverage

A point made above is that unit tests are useful but they’re often not complete. Unit tests can fool you with excellent syntactic coverage but sub-standard functional coverage. We have many tools to measure the former, but only experience to measure the latter.

Sure, you’ve covered all of the lines, but did you actually choose a representative set of inputs? Are you making the right assertions? Did you actually test the requirements?

Expectations

One technique that we use a lot is expectation files (called snapshots in some frameworks). Instead of writing several (sometimes, dozens of) assertions, we format output to text and then compare it against the text produced by the previous, presumably correct test run.

The idea is to detect when something has changed. We use this in Quino to verify log output during certain operations, or to verify queries or generated SQL or model structure or lists of data. Expectation files increase the depth and robustness of tests while at the same time making it extremely efficient to write and maintain such tests.

An expectation (or snapshot) is updated automatically when it changes and shows up as a difference in source control. If the change is expected, the developer commits it.

Experience counts

It takes a lot of experience to write just the right number and kind of tests. You don’t want to write too many tests: it’s code you have to maintain, after all. Also, it can be confusing when the same problem crops up in multiple places in different fixtures.

Some components should have unit tests as well as integration tests. For other components, unit tests are redundant because the integration tests cover everything already. Experience guides you in deciding what to write first, what to keep, and what to throw away.

Too many tests?

It is possible to have too many tests. If you’re not aware in which layer your code resides, you might end up running the same code in multiple scenarios, when that component behaves the same regardless.

For example, if you’re testing how expressions are mapped to a database, then that test should definitely run against every supported database. If you’re testing how a high-level query composes those expressions before they get to the mapper, then you only really need to run it against one database in integration.

Conclusion

No-one wants to admit to releasing untested software. And no-one really wants to do manual testing. Automating tests reduces turnaround time for changes and enhancements. It also increases confidence for quick turnarounds when going to manual testing or production.

Unit tests are good, but prefer coverage in integration tests so that you have the best guarantee that your tests are running your code in a way that emulates the production environment as closely as possible.

Encodo White Papers: Document Everything (2017)

2023-10-04T21:36:01+02:00

Published by marco on 4. Oct 2023 21:36:01 (GMT-5)

Good documentation is part of every piece of quality software. What do we mean by “good”, though? Or “documentation”, for that matter? Quality software should be self-explanatory, but don’t be fooled into thinking that you don’t need to write documentation.

Actors & Use Cases

Documentation has an audience. Before writing anything, consider who you’re writing it for. What are the possible audiences?

Evaluators

Evaluators are interested in what your software does, how it interacts with other software, its performance characteristics, system requirements, the product roadmap, open issues and so on. If you don’t document your software sufficiently, an evaluator won’t purchase it in the first place.

By “purchase”, we mean that an evaluator will decide to use your software. This applies not only to commercial projects, but also to open-source freeware or even internal company software, be it a potentially time-saving Excel spreadsheet, a set of common UI or server components or an enterprise-wide multi-tier application.

Installers

Installers are interested in the basic installation options/paths and how to get from purchase/download to running. Here you need to find a balance between getting them up and running quickly, but also informing them that there is more to your product than just the standard rollout. They need to know that they can get set up efficiently but also that they’re not locked in to a single way of doing things (unless that’s what you’re selling).

Customizers

Customizers are advanced installers: they want to know how to tweak or customize an installation to meet their special needs. These are often the same people as installers, but

New Users

New users are going to use installed/customized software. They want to not only know what your software does, but how they can use it for these standard tasks. They are interested in underlying concepts in both the application domain and the user experience. They need both introductory and high-level documentation, with meticulous, step-by-step instructions. These users are likely to navigate documentation in a progressive manner, reading from beginning to end.

Everyday/experienced Users

Everyday/experienced users aren’t generally interested in introductory documentation. They are interested in how to become more efficient with your software. They will jump around in the documentation, using a search function to find what they need.

Extenders

Extenders are users—usually developers—who will be using your software as a building block, integrating it with other software or extending it to meet their needs. These users are interested in command-line options as well as descriptions of available APIs. If the API surface is larger, then functionality should be grouped and examples included to demonstrate how to use the various calls in common workflows.

Developers

Last but not least, you have to document for developers. That means writing your code and documenting it in a way that is understandable not only to you but other members of your team. Future members of your team, will also need to get up to speed. As is often the case, you yourself will be one of those future developers, when you come back to a project or product after a longer absence. Your future you will definitely thank you for leaving well-documented clues.

How to get started

Wow! That seems like quite a lot of documentation to write. It is. As with anything else, you’ll have to prioritize. We can make a list of the various documentation types we have at our disposal and identify the actors that would use them.

Description/purpose (evaluators)
System requirements (evaluators, installers)
Readme (evaluators, installers, customizers, extenders)
Concepts (all users)
Tutorials (users, extenders)
Training Videos (users, extenders)
Error messages (users, developers, extenders)
Command-line help (installers, customizers, extenders)
Code documentation (developers, extenders)
Log messages (developers, extenders)
Issues/stories/bugs/todos/roadmap (evaluators, developers, extenders)

As you can see, we consider anything that helps actors to understand the software to be documentation. That means that writing useful error and logging messages is also an important way of documenting the product. Similarly, a clearly defined roadmap with stories/bugs/todos provides context for evaluators and developers. All of these forms of documenting a product can save everyone a lot of time, money and confusion by offering context-sensitive documentation right where it’s needed.

This extends to everything in your software or product: the best documentation is a good design. If the UX is more intuitive or command-line help is clear or the APIs are consistent and well-organized, that can go a very long way already. There is less need for extensive tutorials explaining each and every task when the product documents itself.

For example, if you name an API getUsers() and an input variable includeAdministratorUsers, then you don’t need to write much more than “Gets a list of users, optionally including administrators.”

For those reasons and many others, we recommend getting started early with documentation. If look at the list above, that’s kind of obvious advice.

If you don’t know the purpose or the concepts, then you shouldn’t be building the product yet.
You need to document how to set up and install the product or your own developers won’t be able to work.
If error and log messages are obscure, your developers and supporters will lose a lot of time and money on issues and bugs.

Most importantly, the simple act of trying to describe what you are making will lead to a better product. You’ll often find that, as you document, you’ll notice things that could be done better or more intuitively or more consistently or more easily. The simple act of trying to explain what you’re making leads to a better product. If you find it relatively quick and easy to write documentation, then there’s a good chance that you’ve managed to build quality software.

Encodo White Papers: Continuous Integration and Delivery (2017)

2023-10-04T21:35:54+02:00

Published by marco on 4. Oct 2023 21:35:54 (GMT-5)

An important part of the software process is the final step: delivery.

If you can’t get your software into your customer’s hands, then what’s the point of writing it at all?

Goals

There are several at-times cross-cutting goals. In descending order of importance, they are:

Improve reliability and quality of releases
Improve efficiency of the release process
Improve testing feedback loop
Improve efficiency of the development process

Definition

There are several aspects to continuous integration and delivery:

Build: create testable artifacts
Test: execute automated tests on a clean machine, separate from any developer’s environment
Package: create deployable artifacts (may be same as build output)
Deploy: deploy artifacts to target environments (e.g. Dev, Staging, UAT or Production)

Benefits

As expected, working in an organized manner with increased automation has clear benefits.

An excellent protocol of which software version contains which changes
Automated and centralized archiving of versions
Improved code & configuration quality as all code is tested on a non-developer machine
Practice makes perfect: if you’re delivering during the entire life of the project, then the final delivery is much more predictable and far less stressful

Limitations

There are obviously limitations as well. The most immediate one is infrastructure investment: you have to set up build servers or purchase them in the cloud. You also have to make your process work with automated builds and possibly retrain personnel to work with it.

You have to plan your project and you have to have patience on the part of all stakeholders. You have to train everyone on the team to not even consider releasing a version of the software from a developer PC.

Setup and maintenance of build agents takes time and effort, especially over longer periods of time. Operating systems are upgraded, core components changed, build systems upgraded. All of these things will cause the build to fail on a given agent, even though nothing is actually wrong with the product. Here again, though, the agent will act as a canary in the coalmine for your development team. More often than not, the build-server failure will alert the team to avoid a feature that would have other wise cost them time to integrate before it’s ready.

Deployment types

The type of deployment depends on the product.

For desktop software, you need to build an installer or a compressed archive that users can execute and install. Mobile or UWP applications must be built and then delivered to app stores for installation. Web servers and sites can be deployed directly to in-house servers or into the cloud (e.g. AWS or Azure).

These deployment types are for the end users, but there are many more releases than that. Developers need to test their changes locally. Testers need to get these versions in order to provide feedback in a timely manner. We think of all of these releases as part of the build infrastructure, not just the continuous-integration server delivering an end-product.

Requirements

Clean, predictable versioning: preferably semantic versioning, which means that you can support alpha, beta, RC and other pre-release versions
Scripted packaging: there can be no manual steps in the entire release process
Infrastructure: One or more agents for executing builds, either hosted locally or in the cloud
Knowhow: Knowledge of how to configure builds and deployments, preferably distributed among multiple team members (even if access is limited to IT)

Experience

At Encodo, we have experience with various systems for various types of software. We started off using Jenkins but moved to JetBrains TeamCity several years ago. Web projects have their own packaging and testing mechanisms (e.g. WebPack, Mocha) that integrate into almost any build infrastructure. We’ve also used Fastlane combined with Test Flight for mobile deployment. Our main expertise lies with configuration of .NET deployments paired with TeamCity.

Recommendations

Use the same tools on the build server as your developers do. That is, if you use .NET with R# and the NUnit test runner, then use those same tools on your build server. In this case, TeamCity is a good fit for many of our projects.
Avoid writing too many custom scripts for the build server. The build server will need to perform some extra tasks (like clearing databases), but make sure that those scripts are in the code repository and can be executed and tested locally as well. This decreases debugging time in the CI environment.
If you do have to write scripts for the build server, consider whether you can use the same scripts on local developer machines. For example, Encodo uses a lot of NAnt scripts to clean, build, deploy and package solutions. We use those scripts locally as well on the build server. This increases the likelihood that an issue with the scripts will be detected locally rather than only on the build server (where it’s generally more difficult and time-consuming to address).

Encodo White Papers: Design by Contract (2006)

2023-10-04T21:35:48+02:00

Published by marco on 4. Oct 2023 21:35:48 (GMT-5)

Design by Contract is a software engineering practice in which software requirements and promises − the “Contract” − are explicitly written into the code. The code is, at the same time, better documented, more reliable and easier to test against. Encodo uses this technique to ensure software quality.

A brief overview of contracts

A software contract is composed of several components: preconditions, postconditions and invariants. Preconditions are what a component requires of a client, whereas postconditions are what a component guarantees to a client. In object-oriented programming, these contracts are attached to method calls in a class. Invariants are a list of conditions that must always be true for software. An invariant is typically attached directly to a class; the runtime checks the class invariant when entering and exiting a method call.

Popular programming languages, like Java, C#, Delphi Pascal and others, lack the language constructs needed to express these contracts. However, these languages contain assertion constructs, which allow one to roughly describe the contracts. The section on emulating contracts in other languages section shows the most common technique.

Eiffel is a language whose inventor, Bertrand Meyer, pioneered Design by Contract. It includes rich support for expressing contracts, is similar to Pascal in syntax and will be used for the examples below. The FAQ offers more information on why we chose Eiffel for our examples.

Using contracts

The best way to show how the use of contracts affects software is with an example. Imagine a database connection class with a method Open. This opens a connection to the database, allocating resources for it and failing if the request is refused.

Listing 1 - Initial definition

Open is
  do
    – Execute code to open the connection here
  end

Any procedural programming language is capable of formulating the code above. However, what happens if Open is called twice in a row on the same connection? One way to handle this is to simply ignore subsequent calls to Open.

Listing 2 - Ignoring subsequent calls

Open is
  do
    if not IsOpen then
      – Execute code to open the connection here
    end
  end

This is not optimal, for several reasons:

Clients that misbehave by repeatedly calling a powerful function like Open will never know they are doing so.
Clients that lack the original source will have no idea that the check is already made and will check again, needlessly muddying their code and wasting performance.
The function fails to open the connection silently, which is an extremely dangerous way of responding to a non-standard condition.

Another way to respond is to accept that this might happen, but making it non-silent, logging the occurrence to some sort of logging mechanism.

Listing 3 - Logging subsequent calls

Open is
  do
    if not IsOpen then
      – Execute code to open the connection here
    else
      – Log a warning
    end
  end

This is slightly better and an entirely appropriate solution in some cases. However, the connection is quite a low-level component; it should not be responsible for deciding what to do about repeated calls to Open. We can use a contract to push the responsibility onto the client.

Listing 4 - Precondition

Open is
  require
    not IsOpen
  do
    – Execute code to open the connection here
  end

The require clause contains optionally named boolean expressions. If one evaluates to false, a precondition violation is signaled. The violator can immediately be pinpointed and repaired to conform to the contract (by adding a check for IsOpen before calling Open). What are the benefits?

A client has a list of conditions that must be satisfied before calling a routine. The interface is clear.
A routine has a way of devolving responsibility for certain conditions onto its clients.

The contract for this routine is not complete, as it has only published its requirements, but said nothing about guarantees. Given the name of the function, we would expect it to have the following postcondition:

Listing 5 - Postcondition

Open is
  require
    not IsOpen
  do
    – Execute code to open the connection here
  ensure
    IsOpen
  end

The function is now completely defined, having explicitly detailed its requirements and guarantees. The postcondition often looks quite superfluous: the code for opening the connection is right above it, isn’t it?

Not necessarily.

If the function is deferred (abstract in Java and Pascal, virtual in C-style languages), the implementation is in a descendent. The pre- and postconditions apply to the redefinitions as well. This allows a base class to very precisely define its interface with other classes without making any decisions about implementation.

Listing 6 - Deferred implementation

Open is
  require
    not IsOpen
  deferred
  ensure
    IsOpen
  end

The precondition can only be expanded in a descendent, whereas the postcondition can only be further constrained. That is, a descendent cannot define the precondition to be not IsOpen and DatabaseExists. A client with a reference to the ancestor class sees only the ancestor precondition and cannot be forced to conform to a contract defined in a descendent.

Likewise, the postcondition cannot be redefined to be IsOpen or ActionFailed. The original interface has already decided that if the database cannot be opened, the implementation must raise an exception. A client with a reference to the ancestor class does not have access to the ActionFailed feature and cannot accept this as a valid postcondition.

The descendent adjusts the precondition in a function like this:

Listing 7 - Extending a contract

Open is
  require else
    AutoCloseIfOpened
  do
    – Execute code to open the connection here
  ensure then
    not CompactOnOpen or DatabaseIsCompacted
  end

This descendent has expanded the precondition to allow a caller to call Open repeatedly only if IsOpen is false (inherited precondition) or if the AutoCloseIfOpened option has been set. Likewise, it has further constrained the postcondition to promise that, in addition to IsOpen being true (inherited postcondition), the database will be compacted if the CompactOnOpen option is set.

Emulating Contracts in other Languages

So, that’s Eiffel. How can other languages express contracts without the proper language constructs? As mentioned above, almost all modern languages include an assert function, which accepts a boolean expression and raises an exception if it is false. This function can emulate pre- and postconditions, but class invariants are largely impractical in languages without some form of pre-processor (a search for Design by Contract in C++ turns up several such libraries). Here’s Listing 5 written in Delphi Pascal:

Listing 8 - Emulating a contract

procedure Open;
  begin
    Assert( not IsOpen );
    // Execute code to open the connection here
    Assert( IsOpen );
  end {Open};

Note how the contract is expressed in the implementation body; this makes contract inheritance difficult. The following pattern illustrates a single level of contract inheritance (which prevents descendants from removing contracts by not calling inherited methods):

Listing 9 - Emulating Contract inheritance

procedure Open;  // Not overridable
  begin
    Assert( not IsOpen );
    DoOpen;
    Assert( IsOpen );
  end;

procedure DoOpen; virtual; abstract;

Under this pattern, descendants are required to implement DoOpen and cannot alter Open (Delphi methods are by static by default − equivalent to final in Java, sealed in C# or frozen in Eiffel). There are naturally drawbacks to this approach, especially when compared to the rich contract syntax available in Eiffel*, but the technique is sufficient for many of the desired contracts.

See the further reading below to learn about using old in postconditions and expressing class invariants

FAQ

Question 1

“Why is there no try .. finally to ensure that the postcondition is checked in Listing 8?”

A postcondition is only guaranteed when the function exits successfully. In the example, it is perfectly legitimate for Open to fail because of an external connection problem. The precondition only guarantees that the connection is not open, not that it can be opened. Such guarantees are useless because they involve performing the action in order to check that the action can be performed.

The function should raise an exception if it cannot open the connection, avoiding evaluation of the postcondition and resulting in an acceptable error condition. An implementation that fails silently will cause a postcondition violation, which is an unacceptable error condition.

Using a try .. finally construct to force evaluation of the postcondition under all circumstances would result in both the desired error (connection could not be opened) and a postcondition violation, which is not correct.

Question 2

“What if there is an exit or return statement in Listing 8?”

Question 1 proposed a using a try .. finally construct to ensure that the postcondition was always executed. As you can see from the answer, this has undesirable side effects. The simple answer is not to use instructions that break the normal instruction flow (e.g. exit or break). The usefulness of such constructs is debatable and the drawbacks are high (especially, as shown above, when the instruction avoids checking contracts).

This exposes the weakness of languages without explicit contract constructs — it requires discipline to avoid bad practices. Relying purely on discipline invites error. However, it is better than nothing at all.

Encodo White Papers: Code Reviews (2017)

2023-10-04T21:35:39+02:00

Published by marco on 4. Oct 2023 21:35:39 (GMT-5)

A healthy and active review culture is essential for any team interested in building quality software. At Encodo, we’ve been doing reviews for a long time. They’ve become an essential part of everything we do:

Analysis
Estimates
Design
Architecture
Coding
- Style
- Performance
- Coverage
- Security
Deployment
And more…

Definition

What we mean by review is not a formal process at all. It is simply that you prepare work you’ve done for an informal presentation to a team member. Explaining what you’ve done in a review is often a good way of collecting your thoughts—you should be able to explain what you’ve done. Getting a review from a colleague is an efficient and productive way of making sure you can do that.

Limitations

While there are many reasons to do reviews, we’ve also learned that reviews can’t do everything.

They’re not very efficient at distributing know-how. A review helps another, well-versed team member learn what you’ve done, but it won’t help non-team members get on-board.
If the reviewer has to learn too much from the review, then they cannot perform a useful review. They can be, at best, a sounding board.
For similar reasons, a review is not a good way of mentoring junior team members.

Benefits

The quality of anything that you expect to review with someone automatically increases. Just knowing that you will have to tell someone what you did and why increases the likelihood that you think about the solution clearly and consider potential questions.
A major benefit of getting a review is that it makes you check and prepare your own work. You’ll often notice that there’s still work to be done all on your own—even if the reviewer doesn’t say one word.
When you’ve spent a long time on a problem, a fresh perspective as offered by a reviewer may offer an alternative solution. Often we’re so deeply involved in a solution that we don’t notice when we’ve diverged from the requirements and are just making more work for ourselves.
When applied to very early stages, like analysis or estimates, the savings in time, money and manpower can be enormous.

Scheduling

It’s important to get reviews often enough to avoid wasting time and effort but not so often that your work or the reviewer’s work grinds to a halt. It’s all about balance.

A good rule of thumb is about one review per task. If your task is longer than a day, then think about how to break up that work into phases in order to get a review of earlier phases.

That way, you’re more likely to catch issues before building on top of mistakes.

Synchronous versus Asynchronous

Encodo prefers live, face-to-face reviews.

This is the most efficient manner of reviewing as neither party has to prepare anything other than the work to be reviewed. Issues that come up can often be handled immediately—and such issues are far more likely to be mentioned and fixed. While in-person reviews are superior, video-chat/shared-desktop reviews work quite well, too.

If that’s not possible, then we have also used tool-based, asynchronous reviews, such as pull requests with review software. However, we find these to be not only less efficient but also less likely to find as many issues.

With a live code review, it’s relatively easy to ask the submitter to reorder, split or squash commits. It’s also easier to point out and quickly fix stylistic issues (like naming or interface usage, etc.). Because the turnaround time is much faster, a reviewer is far more likely to point out smaller fixes that would improve code quality, maintainability and so on.

However, in an asynchronous review, a reviewer must decide what is most important. Is it worth rejecting the whole pull request if it’s 95% correct with a few details? Do you reject it and ask the submitter to fix up spacing or formatting or missing documentation? Do you really write down every last little thing you would have said? Do you reject it and hope that the submitter understands all of your notes? Or do you accept it and just fix those things up yourself? How many iterations do you go through?

We prefer synchronous, face-to-face reviews because they’re much more efficient. Misunderstandings can be cleared up quickly, iterating until the submitter and reviewer find a consensus.

Rules of Thumb

A review should not consist of more than 200 lines of code
A review should not last longer than an hour
One reviewer is usually enough; if necessary, pull it at most two reviewers
For non-code, use your judgment to determine an appropriate amount of reviewable work

Getting Faster

We encourage reviews everywhere because we know how to make them faster.

Use your IDE to help submitters improve their code before they ask for a review. Formatting, code-style preferences and other issues can be eliminated from review if the tools enforce them.
Start by focusing on the changes made (diffs) and expanding from there if further context is needed. If the reviewer was chosen well, then that context is often already available
Review related commits en masse rather than individually
Smaller, focused commits are much easier to explain and to review
Spend less time on commits that are purely refactoring (trust your tools and your tests)
Practice makes perfect!

Getting Better

Both the reviewer and the submitter need to practice. A reviewer should practice diplomacy and formulate critique in a way that it will be accepted. A submitter must keep an open mind and prepare good arguments or justification for the code. Both sides should stay positive. A review shouldn’t be a competition: it’s about producing high-quality code together, as a team.

Encodo has done presentations on reviews, in both English and German.

Encodo White Papers: Component-based Design (2006)

2023-10-04T21:35:31+02:00

Published by marco on 4. Oct 2023 21:35:31 (GMT-5)

This article is part of an archive of Encodo White Papers.

What is the best approach when designing a new application, be it a small tool or an end-user application?

Build a Prototype?

Many developers jump straight into a prototype, in order to get a feel for how the application will work. While prototypes are good for demonstrations, they are dangerous: in projects with tight time or budget constraints, the temptation to simply “build out” the prototype becomes irresistible. This leads to applications with nice user interfaces (hereafter called UI), but inflexible and difficult-to-follow implementations.

Build Components

A better first step is to list the requirements and assign them to possible components. This doesn’t have to be a long or complete evaluation of the requirements; a few minutes is enough to come up with enough ideas to get started coding. These non-UI components are a natural fit for testing environments and are more likely to define a clean, sensible API (Application Programming Interface). Once the core logic has been built and tested, a prototype can easily be built on top of it.

To summarize, the component-based approach is important for the following reasons:

It maintains business logic in clearly-defined units
It improves testability of business logic
It improves portability and reuse
It avoids spreading core logic throughout event handlers, which is a common practice in RAD (Rapid Application Development) environments

RAD considered harmful

A good UI library is a wonderful thing, allowing clean-looking, well-integrated applications to be built in a very short time. However, the allure of this style of programming is dangerous, as it quickly leads to applications without a clearly defined API, which leads to extensibility and maintenance issues.

These systems entice programmers into working “backwards”, building their application logic around events generated by the UI. The first generation of RAD environments were notorious for mixing UI and business code. The latest generations make use of libraries with “code-behind” built right in, automatically supporting core/UI separation in both web or classic UI application.

This separation of core logic and UI events makes is commonly called the MVC or Model-View-Controller pattern.

What is MVC?

MVC is the official name for the technique described above, in which functionality is contained in a model (M), which communicates state changes to a view (V) through some form of update mechanism. The controller (C) represents user input and applies changes to the model.

In many UI libraries, the view and controller layers are merged, making it much easier to apply the pattern to smaller projects. View components are typically bound to model components using the Observer pattern: the view “listens” for changes in the model and reacts accordingly.

Designing with Components

Consider a tool which processes text files and generates output of some kind (perhaps PDF or CSV). The actual task doesn’t matter − this is the kind of tool that is often written in a seat-of-the-pants fashion, with the excuse that it is “faster” to get it done this way. Let’s take a component-based approach and see what we get.

What are the components of the system?

Transformer − Takes an input, applies one or more actions and generates an output
Actions − Performs an operation on data
Importers − Readers for various input formats; convert to a format the transformer understands
Exporters − Writers for various output formats; converts from the transformer format
Plugin Registry − Registration for recognized input and output formats
Options/Preferences -Global options for the system

Analyzing the Component-based Design

This list took only a few minutes to write and could have been written by anyone familiar with the project. The list contains only domain knowledge — there is no implementation-specific data. Having written down the requirements, we see that there is a need for an internal data representation, which will be used by the importers, exporters and actions. This is a facet of the design that might have gone unnoticed during prototyping, but would have been expressed implicitly nonetheless.

Is it overdesigned?

The list of features above is not an “over-design”, but rather an explicit expression of the specifications. While an implementation can avoid using importer, exporter and action components, these concepts are part of the design nonetheless: an implementation without tehm is simply more difficult to describe, understand and extend.

With a little bit of thought, we have designed a system that will scale to multiple import and export formats and even support multiple transformations. Writing the application in this way may involve marginally more initial work, but will result in a far more testable, extensible and reusable framework, decreasing maintenance and support time.

Does it slow development?

Another popular argument is the perceived reduction in programming efficiency. Applications or tools of the “throwaway” kind will take longer to develop when using a clean programming model. Whereas that may be true in the very short term, the majority of an application’s life span is spent in support and maintenance, which takes more time and energy if the application is poorly designed.

Though a throwaway prototype may be available marginally quicker, it will be of poorer quality. In addition, subsequent applications cannot benefit from its code. The biggest loss comes in the form of functionality, improvements or bug fixes which are never even attempted because the code is not in a maintainable or testable state.

How does the UI work?

Realization of this design at the core level is not so difficult. Even though the application initially only has one importer and one exporter, it doesn’t take much more to define an API that supports multiple plugins. Writing the tests for these components is likewise trivial. The opposite is true in the UI: building an interface to manage and configure all of the functionality that was easily written into the model is prohibitive.

There is no reason, however, that the UI has to express all of the details of the underlying model; the application, as specified, need only expose enough functionality in the UI to be able to import and export. The UI stays remarkably simple, but can be easily and quickly extended to offer more features, if desired. Since the model has automatic tests, it can be assumed to be stable and it is easier to accurately estimate the time required to build the new GUI elements.

Analyzing the Prototype-based Design

The standard, quick-prototyping approach would have started coding a main form with some input fields, building the transformation code directly into the form itself. Options and preferences would have likely been encapsulated with a few controls on the main form, which, in turn, would have been responsible for loading and storing them.

The design sketched above would be expressed implicitly and partially, at best. An application written without these concepts in mind will not be worth refactoring. If the code is re-used at all, it is typically copied to a new project and modified there, resulting in multiple copies of nearly the same code. Fixes and enhancements to one will not necessarily appear in the other.

A prototype that is considered “throwaway”, but grows into an application, does not benefit from any of the following:

It is easier to document the clearly defined API of a model; good documentation allows multiple developers to support or upgrade the application
Reuse across multiple applications
It is far easier to refactor and repurpose model code that lends itself to test-driven development

Extending the Application

It’s obvious from the design above that it can be extended to support multiple importers, exporters and actions. The initial application was assumed to be a GUI which did not expose all of the functionality available in the model. The GUI can be made more powerful, exposing more of the underlying functionality. The extensibility of the design is clear. What about reuse?

A command-line version

The examples below are in Delphi Pascal.

In a traditional prototype, command-line support is bolted on to the same application, because the required code is buried in UI structures. Such a command line application will involve something like:

Listing 1 - Hacking the GUI Application

if command = 'C' then begin
  { Create the main form first, so it is 
    treated as the main form by the system, then 
    hide both forms so they don't appear in front 
    of the command line. 
  }
  form:= MainForm.Create;
  form.Visible:= False;
  prefsForm:= PrefsForm.Create;
  prefsForm.Visible:= False;
  prefsForm.LoadOptions;
  form.EdtFileToUse.Text:= parameterFromCommandLine;
  form.BtnConvertClick( nil );
  form.Close; // Close main form to quit application
end;

Using the elements of the model from the component-based design, we could build a separate application, whose main loop is logical and readable:

Listing 2 - Logical and readable

if command = 'C' then begin
  options:= ToolOptions.Create;
  options.Load;
  try
    try
      converter:= FileConverter.Create( options );
      converter.Convert( parameterFromCommandLine );
    finally
      FreeAndNil( converter );
    end;
  finally
    FreeAndNil( options );
  end;
end;

The second version addresses the requirements in a much clearer, more maintainable fashion. On top of that, the implementation in the GUI application would have a similar pattern. The code above could go into an event handler, passing text from an input control instead of an argument from the command line. The following code assumes that the converter and options from the command line example above are globally available:

Listing 3 - A clean GUI implementation

procedure MainForm.BtnConvertClick( Sender: Object );
begin
  Converter.Convert( EdtFileToUse.Text );
end;

Conclusions

With a small amount of time invested at the beginning, one can define any application in terms of UI-independent components. An application that was designed in this way lends itself to ready reuse. Applications that use these components need only be concerned with delivering input to a clearly defined API. Fixes and updates to the core components will be reflected in all applications.

Encodo White Papers: Test-driven Development (2006)

2023-10-04T21:33:33+02:00

Published by marco on 4. Oct 2023 21:33:33 (GMT-5)

Most people in the software industry have heard of test-driven development — it has become a buzzword with several possible meanings.

The Problem with Unit Testing

One of the more negative associations is the notion of unit testing. Unit testing traditionally involves writing a test for each and every routine in a unit or class, to ensure that it does what it claims. This practice has, of late, declined in popularity — mostly because of the sheer mindlessness of maintaining complete coverage of an ever-growing API.

Component Testing to the rescue

Another form of testing is to write tests for components of a system, ensuring functionality on a higher level than that of the routine. Tests of this kind tend to encapsulate use cases, which are far more closely related to the way in which clients (actual users or other software) make use of an API. Naturally, use cases for extremely low-level components will end up testing individual routines, just as unit testing does.

Writing the component tests is not tedious and, in fact, helps tremendously in determining whether a piece of software is complete or not. They can be viewed as software implementations of the requirements documents or specifications. Proper application of Component-based Design makes it quite simple to build tests for the majority of an application’s functionality.

Testing in the code

A far better tool for ensuring consistency at the lowest level, where unit testing traditionally comes into play, is Design by Contract. This practice involves including verification mechanism directly in the software, so that violations of software contracts can be pinpointed and quickly repaired.

The most important element of any testing strategy is to stick with it. When a defect is found, the first step is to create a test to replicate the problem. The next is to fix the error so the problem no longer occurs, but all the other tests still work. Finally, any missing contracts that may have helped pinpoint the problem sooner should be added.

Once the test suite runs through without problems, the software is ready for release testing.

The final step: Release Testing

Automated testing is a fantastic way of guaranteeing baseline software quality, but it is not the last step before releasing a product. For server software or software with a command-line interface only, the test suite can provide an extremely high-level of coverage (approaching 100%). Software which interacts with humans, however, requires a manual testing regimen to verify that the software functions as desired for all forms of input. Whichever parts of the testing chain cannot be automated (UI testing is notoriously difficult) should be documented in detail to ensure reproducibility between releases.

The UI is an afterthought, a detail

2023-09-07T11:02:51+02:00

Published by marco on 7. Sep 2023 11:02:51 (GMT-5)

Complexity: Divide and Conquer! by Michel Weststrate on May 7, 2017 (YouTube)

“Can we make our UI dumb enough to make our app usable without it?”

The video demonstrates navigating through a simple e-commerce site. Then, he shows how the app can be driven from the console by calling the APIs directly—upon which the URL and UI all update automatically. That is, the logic is not in the UI.

He then demonstrates that he can drive the web site without a UI by deleting the rendering to React DOM entirely. He can still manipulate the console API to perform the same operations because the logic is all defined completely independent of the UI. Of course, this is the same command-line interface that can be used in the automated tests, which means that the entire product can be tested without a UI at all.

I’m becoming increasingly convinced that neither React nor Angular is the way to go. Both React and Angular mix logic into the UI, putting the UI front and center. This is wrong. Additionally, Angular suffers from a complete inability to speed up the development lifecycle because it’s so strongly tied to WebPack.

I’ve used Redux before and the boilerplate becomes prodigious. I’ve used the React reducers as well, and it’s a bit better, but still doesn’t feel very natural. I’ve used MobX but long before its current incarnation where it really seems to “just work” as a store of state and reactive programming logic.

The when construct (see 16:37 in the video), which takes a predicate and an action, is a very neat concept that allows you to define exactly how your application reacts to state changes without burying it all in the components.

“If the view is to be purely derived from the state, then routing should affect state, not the derived component tree.”

Therefore, a url-change is an action like any other, modifying the state and letting MobX handle notifying all interested parties. Once you’ve gotten that far, you don’t even need a UI-specific routing library because you can just configure any router to direct URLs to the store API—which will automatically update the UI. The UI (e.g., React) doesn’t have to have anything to do with routing. A route change triggers an action, which changes the state. The UI reacts. The UI does not do anything with the route—it just triggers actions.

A reactive non-UI component ensures that the route stays in-sync with the state by reacting to changes in the state. In most cases, you can just create a value that calculates what the URL should be, based on the state. This could get complicated, of course, but it’s also completely separate from the rest of the application logic and can be thoroughly tested. We can also use the when construct outlined above to simply listen for changes to the calculated URL and update the browser’s location and history. This way, the management of the history and URL is not entwined with the rest of the application logic. It’s just reacting to state changes, like everything else.

Working like this results in automated tests that work naturally and look very much like Playwright tests—but completely without UI and using semantically meaningful constructs. The UI is an afterthought (as Michel himself wrote in 2019). Playwright is nice, but it’s a last resort when you’ve already botched the job of writing your code in a more testable manner. It’s a nice check that the UI is properly wired to the logic of the application, but should not be used to verify application behavior—simply to verify UI behavior.

This all goes very much in the direction of The Humble Dialog Box by Martin Fowler in 2002, which shows that we’ve known how to build software correctly for over 20 years—and we keep getting distracted by “the new shiny”, thinking that we can somehow start with the UI and still get maintainable software.

On the usefulness of containers like Docker

2023-08-27T03:32:42+02:00

Published by marco on 27. Aug 2023 03:32:42 (GMT-5)

The article Works on most machines by Mark Seemann (Ploeh Blog) argues provocatively that containers are a fallback for poorly written software.

“When you have general-purpose software, though, do you really need containers?”

Well, yes. The point isn’t that you need a container to paper over software that isn’t sufficiently generic: it’s to avoid fixing incompatibilities that have nothing to do with your target deployment systems.

I think the author is thinking too much of highly general-purpose software whereas the majority of software doesn’t need to run everywhere and anywhere.

If it’s built for the cloud, it’s going to run in a container anyway. If it’s built for a specific device, it’s going to run on that device.

In that case, why not just run that software at the developer side in the same environment? That way, you can avoid wasting a ton of time fixing problems that are related to how it runs in development rather than production.

“Ultimately, you may need to query the environment about various things, but in functional programming, querying the environment is impure, so you push it to the boundary of the system. Functional programming encourages you to explicitly consider and separate impure actions from pure functions. This implies that the environment-specific code is small, cohesive, and easy to review.”

It implies it, but it in no way guarantees it. The author is also forgetting about the quality of the developer that is likely to be building the solution.

In this post, he assumes that the developer uses enough tests to thoroughly test the system—even to the point where he is able to determine where a solution isn’t sufficiently generalized yet. He assumes that the developer uses methodology like functional programming to separate pure from impure code, and that the developer is good enough to do all of this in a way that is both efficient and leads to a finished product.

This is not at all a guarantee—or even a likelihood—in the real world.

In the real world, developers are not reaching for the stars—even if they had the capabilities, which many do not, they’re often not given the time to do things correctly—they are just trying to get it done.

If they can “cheat” by restricting the world of possible environments—rather than accommodating their software to environments it will never encounter in production—then why not?

It’s actually an engineering problem. If you’re going to make something that has to work well underwater, the only reason it needs to work out of water is because it makes it easier to work on, not because you think it’s worth the time making it function properly when in air. If you can make it just as easy to work on underwater than you it is in air, then you would just do that instead. Wouldn’t you? Why waste your time and your company’s when there’s a lot of other, more important work to do?

ImageSharp vs. SkiaSharp

2023-05-30T22:04:15+02:00

Published by marco on 30. May 2023 22:04:15 (GMT-5)

Updated by marco on 11. Sep 2023 13:14:50 (GMT-5)

I watched a great video about image-manipulation using an AWS lambda function.

Working With Images Like a Pro in .NET by Nick Chapsas (YouTube)

I was curious about the imaging library he was using and searched for ImageProcessingContext (because I saw it in his code). That led me to ImageSharp, after which I searched for comparisons to the cross-platform library used in Maui (MSDN).

That led me to the issue SkiaSharp vs ImageSharp (GitHub), which noted that,

“Note that JimBobSquarePants, the creator of ImageSharp, contributed some interesting discussion in #47.”

I read/waded through that whole issue thread and commented the following:

tl;dr: Maui.Graphics uses SkiaSharp because it is a 2D-rendering library rather than just an image-manipulation library.

For future readers: The discussion itself is not very interesting, but the conclusion is. The title of the issue is Basic premise of the library is based upon a fallacy and harms existing projects. (GitHub) (referring to Maui.Graphics), which doesn’t feel super-constructive (and wasn’t). There are long screeds about how harmful MS is for everything OSS. The final comment is worth reading, as it explains that it turns out that the harshness of the issue title was completely unwarranted (as admitted by the original poster). Good conclusion; typically unproductive Internet discussion.

There is no conflict. Skia’s support for images is weaker than ImageSharp’s but it allows using GPU rendering on supported platforms whereas ImageSharp is for in-memory data (CPU-bound).

In the referenced issue itself, I commented,

“That’s wonderful. While I’m happy to learn that the issue was resolved, is there any way that we can pin this comment to the top so that future readers don’t have to wade through the 80% catfight in the middle?

“I was linked to this issue while researching Skia vs. ImageSharp and found the initial question and a couple of responses interesting, then waded through 80% chest-thumping, then finally got to this comment that essentially says “hey, we actually talked to each other and it turns out it was a tempest in a teapot”, which is what I was hoping to learn.”

I just got a response today:

“No way to pin comments, but I added a link to that comment from the initial issue description.”

Nice! 👌❤️‍🔥

Working with Git Submodules

2023-03-28T22:15:01+02:00

Published by marco on 28. Mar 2023 22:15:01 (GMT-5)

Introduction

The intended audience of this document is people interested in knowing which commands to execute to update submodules. The initial analysis section is intended for people interested in knowing how the commands work and what their strengths/weaknesses are.

The inspiration for this documentation was that I was wondering whether submodules were always cloned with detached heads and if there were some way to avoid that. The short answers to these questions are, respectively, “yes” and “no”.

Skip to the examples below to just see the commands and their effects.

At the end of the document are links to pages referenced to produce this documentation.

Terminology

In the discussion below, the term superproject refers to the root repository that contains submodule references. It comes from the git documentation where they make the distinction because submodules can be nested. Suppose, we have multiple nesting, as shown below.

📁 A
  📁 B
    📁 C

A is the root repository of both B and C
A is the superproject of B
B is the superproject of C

Where do submodules go?

Submodules are stored inside another repository.

For a simple we would see the following:

📁 A
  📁 .git
    📁 modules
      📁 B
        📄 config (worktree = ../../../B)
  📁 B
    📄 .git (points to ../.git/modules/B)

The submodule’s git folder is stored in the superproject’s git folder and is replaced by a file that references the new location. The submodule uses the worktrees feature to check out to a different folder.

Can I share a local copy of a submodule?

No. Storing the working tree of the submodule outside of the repository is not supported.

Why would you want to do that anyway?

One use case is that you have two repositories, each of which includes the same submodule, as shown below.

📁 A
  📁 B
📁 C
  📁 B

Instead of using two copies, you might think you could make the superprojects refer to the same copy of the submodule.

📁 A (refers to ../B)
📁 B
📁 C (refers to ../B)

The advantage would be that changes made to A would immediately be available in C
However, it would no longer be possible to make A and C refer to different commits

Whereas you can manually move a submodule outside of the repository after you’ve cloned it, you cannot configure a superproject’s submodules in a way that Git will be able to clone properly. If you try it, you’ll probably get an error message like,

fatal: No url found for submodule path 'SUBMODULE.NAME' in .gitmodules

The next section explains how you can share local commits for testing.

Testing submodule changes in multiple projects

Assume, as above, that there are two copies of the submodule, B^A and B^C. Suppose there are commits in B^A that have been tested with A, but should also be tested with C.

One way to test C would be to push the commits in B^A and then pull them from B^C. That involves a round-trip to the server, which is not optimal, but relatively straightforward.

Another way to test C would be to add the local B^A as a remote to B^C and then check out the commit from B^A directly.

To set up a remote called B_A in B^C, execute:

git remote add B_A ../../A/B

The testing flow would be, roughly,

Test changes to submodule B^A in A
Create commit #1 in B^A
Fetch from B_A into B^C
Check out commit #1 in B^C
Test changes in C
Repeat as needed

What to expect when cloning with submodules

A clone of a superproject (a repository with submodules) fetches submodules only when required (e.g. when –recurse-submodules is included). If submodules are fetched, then git sets the checked-out commit in each submodule to the commit ID specified for that module in the superproject. This makes sense because that is the correct commit to use. However, this also means that, after a clone, all submodules will be in a detached head state.

On an initial clone, git creates a local branch in the superproject corresponding to the checked-out branch in the clone command (either the default branch or the branch specified in the -b option, if included).

Git does not create local branches in any of the submodules. Git assumes that you will be working in the root repository and not in the submodules. The checked-out branch in the submodule is irrelevant to the superproject.

If you want to work in (one or more of) the submodules anyway, then you have to create a local branch for yourself and check it out.

The detached head situation is not “weird” but “entirely expected” and “working as designed”. All detached head means is that a commit ID has been checked out rather than a named, local branch.

If, however, you want the submodule to be checked out to the same branch as that checked out in the superproject (e.g. main), then the way to address that is to call git switch main in the submodule repository.

This will have no effect on the superproject if the main branch in the submodule repository is at the same commit ID as the one pointed to by the superproject. If it is not, then switching to the main branch in the submodule repository will show up as a change in the superproject (the change being that the submodule repository is now pointing to a different commit). To accept that change in the superproject, simply git add the submodule folder and commit the change.

What does `–remote-submodules` do?

The –remote-submodules option does the following (according to the official documentation):

“Git will use the status of the submodule’s remote-tracking branch to update the submodule, rather than the superproject’s recorded SHA-1 (i.e. “commit ID”)”

That means that using this parameter may cause changes in the working tree of the superproject if the remote-tracking branch in the submodule repository does not point to the same commit as that referenced by the superproject.

“Tracking” a branch in a submodule

The basic submodule registration looks like this in the .gitmodules file.

[submodule "SharedRepo"]
    path = SharedRepo
    url = git@ssh.dev.azure.com:v3/ustertechnologies/uster.quantum/PoC.IMHSharedRepo

If you don’t plan on using –remote-submodules, then that’s all you need.

However, if you want to set up your git submodules so that the superproject knows which branch it should “track” in the submodule, use the following configuration:

[submodule "SharedRepo"]
    path = SharedRepo
    url = git@ssh.dev.azure.com:v3/ustertechnologies/uster.quantum/PoC.IMHSharedRepo
    branch = .
    update = rebase

Note that the branch name is “.”. This tells git to use the same branch name as that which is checked out in the superproject (if it exists; if it doesn’t, then git does nothing further). This allows you to set up the .gitmodules once and it works as expected for all branches. Otherwise, you run the risk of merging in a .gitmodules file that references a specific feature branch (for example) and you end up syncing with that feature branch by accident if you call submodule update with –remote.

The update action indicates how git should get to the desired commit if it needs to make a change. Again, this only applies if you explicitly tell git to use the head commit for the given branch on the remote instead of just using whichever commit is already referenced locally.

A remote-update example

A superproject will see an update if it follows a branch in the submodule (as outlined in the preceding section) and that branch in the submodule has gained new commits since the last time the superproject was updated (i.e. the superproject still references a commit in the submodule that does not correspond to the current HEAD of the branch in the submodule).

Using the –remote-submodules option is a way of cloning a superproject, but also updating its submodules to the latest commits instead of just checking out whatever is referenced in the superproject. It is a useful way of cloning a superproject with the latest commits in not only the superproject’s repository, but also all submodules. However, you are then not only checking out the current state of the repository, but also requesting updates to the referenced submodules.

This only works if the submodule reference specifies a branch, though. If it doesn’t, then git has no way of knowing which branch in the submodule repository it should update to. As noted above, setting this branch doesn’t mean that git will create a local branch in the submodule with that name and check it out; it just means that it will change the commit ID referenced by the superproject for that submodule if the commit referenced by that branch in the submodule is different than the commit currently referenced by the superproject.

Phew! We now know enough to determine the commands to use.

Useful Commands

We now have the base knowledge to work with git and submodules using the command line. This will be useful for e.g. setting up agents.

Imagine we have two repositories

Repository A has a main branch that tracks the main branch of submodule B (currently commit ID1)
The main branch in B points to commit ID1
Repository A has a feature/setup branch that tracks the feature/setup branch of submodule B (currently commit ID2)

The examples will use something like the following diagram to show results. The bold indicates the commit and branch that are checked out. A bold commit with a non-bold branch name indicates a detached head.

The diagram below shows the situation outlined above, with main checked out.

Clone with submodules

To clone a repository with submodules and check out the default branch in the superproject, execute the following:

git clone –recurse-submodules

This results in:

The superproject is cloned and checked out to the default branch
Each submodule is cloned and checked out to the commit referenced in the respective submodule definition
Submodules are in detached head state because git does not create local branches in submodules

Using the example from the start of this section, after executing this command, we will see:

No change from the example is expected.

Clone with submodules (and check out a branch)

To do the same as above, but check out a particular branch, execute the following:

git clone -b feature/setup –recurse-submodules

This results in the same as above, but the superproject is checked out to “feature/setup”. Using the example from the start of this section, after executing this command, we will see:

Update submodules after cloning

To update submodules after an initial clone (not necessary immediately after a clone, of course), execute the following:

git submodule update

This results in:

No changes to the superproject
Missing submodules are cloned
All submodules are checked out to the commit referenced in the respective submodule definition

Submodules where a change to the checked-out commit is required are in detached head state. If no change is made, then the submodule remains at which detached commit or branch was previously checked out

As with an initial clone, this command does not update any references to submodule commits.

Clone with submodules and update remote references

To not only clone a superproject and all of its submodules, but to also update references to those submodule’s latest HEADs (as outlined in the remote-submodules section above), execute the following:

git clone –recurse-submodules –remote-submodules

This results in:

The superproject is cloned and checked out to the default branch
Each submodule is cloned and checked out to the latest commit on the branch referenced in the respective submodule definition
Submodules are in detached head state because git does not create local branches in submodules

If, for example, the remote branch main in repository B had been updated to BID2, then the reference from A to B would also have been updated to BID2:

Update submodules to remote references

To update submodules after an initial clone and update references (as outlined in the remote-submodules section above), execute the following:

git submodule update –remote

This results in:

No changes to the superproject
Missing submodules are cloned
All submodules are checked out to the latest commit on the branch referenced in the respective submodule definition
Submodules where a change to the checked-out commit is required are in detached head state. If no change was made (i.e. the remote commit for that branch in the submodule is still the same commit as that referenced by the superproject), then the submodule remains either with a detached commit or whichever branch was already checked out

As when calling clone with –remote-submodules, this command updates submodule references. Therefore, if the remote branch main in repository B had been updated to ID3, then we would expect to see A referencing that commit in B.

Links

The following links were helpful in writing this documentation:

Extracting subtitles from an mkv with ffmpeg

2023-03-17T07:22:46+01:00

Published by marco on 17. Mar 2023 07:22:46 (GMT-5)

I’d watched an excellent movie [1] that was primarily in German but had some English parts, with hard-coded English subtitles and soft German subtitles plastered on top of that. I wanted to cite a bunch of interesting sections, so I looked for the subtitles online. Only the English subtitles are available, which I didn’t want. I liked the German formulation and wanted to cite that.

Well, I have the subtitles: they’re just trapped in the mkv file. I figured that there was some way of extracting them, but a search turned up a lot of pre-compiled and sketchy-looking software whose veracity I couldn’t adequately validate. I want the subtitles, but I don’t want to get a virus or crypto-locked.

I got a good hint to use ffmpeg from How to Extract .SRT Files From MKV File (Reddit). It suggested something like,

ffmpeg -i FILENAME.mkv -map 0:s:0 german.srt

Once I’d installed ffmpeg with Homebrew, I was able to extract a subtitle stream. Unfortunately, it was kind of short, so I’d grabbed the wrong stream.

Part of the output of the command above is a list of available streams, shown below.

Stream #0:0: Video: h264 (Main), … (default)
    Metadata:
      DURATION        : 01:25:55.332000000
  Stream #0:1(ger): Audio: aac (LC), 48000 Hz, stereo, fltp (default)
    Metadata:
      title           : Stereo
      DURATION        : 01:25:55.285000000
  Stream #0:2(ger): Subtitle: ass
    Metadata:
      title           : German forced
      DURATION        : 01:03:24.130000000
  Stream #0:3(ger): Subtitle: ass
    Metadata:
      title           : German
      DURATION        : 01:25:43.890000000
  Stream #0:4(ger): Subtitle: ass
    Metadata:
      title           : German SDH
      DURATION        : 01:25:43.890000000

The ffmepg documentation isn’t particularly illuminating on the -map option, but I finally figured out that the parameter is something like:

The first position seems to be the file selector (you can specify multiple inputs with multiple -i options
The second position seems to select the type of stream, where s indicates subtitles (I intuit this because it looks like p indicates programs, according to FFMPEG: How to chose a stream from all stream [sic])
The third position selects the index of the stream within that type

Armed with this information, I was able to select the second subtitle stream, which is the full German subtitles rather than just the German subtitles for the English parts.

ffmpeg -i FILENAME.mkv -map 0:s:1 german.srt

This gave me the desired subtitles in seconds.

Happily, I have what I want and I didn’t have to install any sketchy tools that were installed in an unvetted binary. Instead, I’m comfortable installing the well-known tool ffmpeg using the well-known package manager brew.

[1] The movie was Oeconomia (IMDb), which is, honestly, must-see viewing for everyone. Every single person should see this movie, to learn how the macro-level economy really works and how we’re being used.

You're already testing; now automate it.

2023-03-05T21:23:29+01:00

Published by marco on 5. Mar 2023 21:23:29 (GMT-5)

Updated by marco on 9. Apr 2023 23:29:26 (GMT-5)

Introduction

Testing is any form of validation that verifies a product. That includes not only structured validation using checklists, test plans, etc. but also informal testing, as when engineers click their way through a UI, emit values in debugging output to a console, or perform operations on hardware.

Automated testing is common for software, as regression-style tests that execute both locally and in CI. This includes unit, integration, and end-to-end tests.

The following discussion focuses primarily on _software-testing_ but hopefully contains some insights and information relevant to other engineering disciplines (e.g., embedded and hardware developers).

The testing mindset

Testing is primarily a mindset.

Thinking about what you’re building in the terms outlined above can help you to determine how and what you’re actually going to build. It will help you focus,

going from “this would be a nice feature” 🤩
to “how would I test it?” 🤨
to “who would actually use it?” 🙄
to, perhaps, “it would be neat, but no-one needs it. It’s not a requirement.” ✋
or, “the use case is clear and here is how I would test it.” 👌

You should think of writing tests not as something you _have_ to do, but rather as something you _want_ to do.

How else do you prove that what you’ve made actually works?
What does “it works” mean?
Which use cases are covered?
How do you answer these questions without tests?
What do we mean by writing tests?

Let’s define some of this jargon—use cases? “it works”, etc.—before we continue.

Why do we test?

It’s a bit of a provocative question, perhaps, but it makes sense to ask about anything into which you’re going to invest time and money.

So, let’s start a bit further back.

❓ What would we like to do?

“We would like to build a product of high quality”

❓ What’s a product?

“A product is an implementation of a set of requirements.”

❓ Then what’s a requirement?

“A requirement is a collection of use cases.”

❓ OK, fine. What’s a use case?

“A use case comprises a set of initial conditions, an action, a set of inputs, and an expected output.”

❓ What is quality?

“A product that satisfies its requirements is of higher quality than one that does not.”

❓ How can I know that my product has the desired quality?

“We test use cases for a version of a product to determine quality.”

❓ How can I know when my product has enough tests?

“When all of the use cases are covered.”

❓ What if I change the product after I’ve tested it?

“Then you have to test all of the use cases again.”

❗ What the heck? That’s boring! I don’t have time for that!

“It’s called regression-testing. There’s no way around it.”

❓ What if I know that I’ve only changed a tiny thing?

“You might be able to get away with it. But that’s where 🕷 bugs come from.”

❗ I can’t afford to test everything manually every time I make a change!

“ That’s why you automate as many tests as you can.”

❗ Running the tests ties up my local machine! I can’t work.

“Run tests in another environment (e.g., in the cloud)”

Conclusions

A product of quality includes tests.
A product is considered untested if it has changed since it was last tested.
Regression-testing is unavoidable.
Automated tests improve efficiency and reliability
Using a separate environment improves robustness

Introduction to methodologies

We’ve established both that testing is a mindset and that it is necessary to building high-quality products.

We should keep in mind that the goal is to have a well-tested product with as many of these tests as possible being automated. The question is: how close to the goal state do you stay during development?

Developer feedback loop

In other words, what does the development-feedback loop look like?

The goal of the development-feedback loop is to shorten the time between a change and its verification. In practice, this often manifests as “knowing as soon as possible when you’ve broken something.” The longer it takes from change to verification, the more likely it is that multiple changes will be verified at once. Root-cause analysis becomes more difficult.

That’s why manual tests are undesirable: they are far less likely to be run/applied in a timely manner, increasing the number of changes that have occurred since the last time tests were run.

So, the longer you wait to define tests, the longer your product remains untested. The longer you wait to automate tests, the longer you must do manual testing to verify behavior.

With that idea in mind, let’s consider the spectrum of methodologies. At one end, there’s development”>TDD, where you write the tests first, letting them fail and then writing the implementation. At the other, there’s writing all of your acceptance tests once you’ve finished the product.

Test-driven Design

Always writing the tests first is just one extreme, and one that scares a lot of people away from automated testing. As with any dogma, strict adherence is unlikely to be efficient.

Sometimes, you’ll need to try out an implementation to see if it’s even feasible or want to play with an API to see how it feels before you write a ton of tests for it. You don’t want to go too long without testing that you haven’t broken something, but you also don’t want to write tests for code that you’re going to throw away in an hour anyway.

Tests are only one part of the array of techniques a developer can use to verify a product. As discussed in more detail below, a strong type system, linting, and static-code analysis of all kinds help verify a product.

We should always be aware of which parts are necessary during which phases. If certain tools take longer to verify code, consider whether they need to be executed all the time, or perhaps just when pushing to a remote, or before merging into the master branch.

Acceptance-tests at the end

If you wait until you’ve finished the product to write all of your tests, you will still have a well-tested product, but you will not have benefited from testing during development.

Being able to test as you go improves your efficiency tremendously, as you’re not constantly fighting with things that are mysteriously breaking. Instead, you’re usually able to pin the blame on the most recent change you’ve made.

A product of nontrivial complexity can be written more reliably and quickly if there are tests. It also becomes possible for one team member to write the tests while another provides the implementation that satisfies it.

A balanced approach

The spectrum in between is where most developers live, writing tests as they go, but not always before they’ve implemented something.

It’s understandable that there will always be certain tests that are difficult, if not impossible to automate. However, the document that follows will provide some tools for extracting the testable bits from the untestable ones to increase coverage. Anything that can be tested automatically can be executed by all team members all the time, as well by pipelines in the cloud.

You’re already testing!

You’re almost certainly already testing.

You might be clicking through the UI or emitting statements in a command-line application, but you’re verifying your code somehow. I mean … you are, right? RIGHT?

I’m kidding. Of course you’re not just writing code, building it, and committing it. You’re validating it somehow.

That’s testing.

A list of validations

If you’re really good, you might even keep a list of these validations. Once you have a list, then,

You don’t have to worry about forgetting to do them in the future
Even someone with no knowledge of the system can perform validation

This is fine, but it’s still a manual process. A manual process carries with it the following drawbacks:

It gets quite time-consuming, especially as the list of validations grows
You’re highly unlikely to perform the validations often enough
- It’s much easier to fix a mistake if you learn about it relatively soon after you made it
You’re also unlikely to add all of the validations you need
- Generally, you won’t validate smaller “facts” and will focus on high-level stuff
You’re much more likely to make mistakes in manual testing
A manual validation process can’t be run as part of CI or CD

Automating the list

Automated testing means that you codify those validations.

😒 Great! I have tests! How the heck do I codify them?

Don’t panic. Almost any code can be tested. In fact, if you can’t get at it with a test, then you might have found an architectural problem.

See? Automating tests will even help you write better code!

🤨 How do I get started?

Just start somewhere. It doesn’t matter where. Don’t worry about coverage. Just get the feeling for writing a proof about a facet of your code. Any bit of logic can—and should—be tested.

What if you still don’t know where to begin? Ask someone for help! Don’t be shy. It’s in everyone’s best interest for a project to have good tests. You want everyone’s code to have tests so you know right away when you’ve broken something in a completely unrelated area. This is a good thing!

Goals

🤸‍♀️ Developers should be excited to use tests to prove that their code works.

Tests should be quick and easy (maybe even fun) to write

A project should provide support for mocking devices and external APIs, or for using test-specific datasets.

Tests should be reasonably fast

A reasonably fast test suite will tend to be run more often. We would like a developer to notice a broken test right after the change that broke it, preferably even before pushing it.

Avoid debugging tests in CI

Tests a developer runs locally should almost always work in CI. Failing tests in CI should also fail locally.

Guidelines

🤨 Don’t be pedantic.

For example,

Don’t get obsessed with automating everything.
- Get the low-hanging fruit first, and leave the rest to manual testing.
- See where you stand.
- If you haven’t automated enough, iterate until done. 🔄
Don’t forbid mocking in integration tests and don’t force mocking in unit tests.
- In fact, stop worrying about whether it’s a unit or an integration and just write useful tests that prove useful things about your code.
Stop requiring only one assertion per unit test: Multiple assertions are fine

Figure out where you stand

The following questions should help you evaluate for yourself where you are on your automated-testing journey.

How much automated testing have you done?
Do you write automated tests now?
Do you feel confident that you can verify your work with automated tests?
Do you understand the limitations?
Do you understand how system architecture can affect testability?

Tests should be useful

We never want anyone in a team to get the impression that we’re writing tests just to write tests. We write tests because they help us write better code and because it feels good to be able to prove that something that was working continues to work. You should feel more efficient and productive and feel like you’re producing higher-quality code.

Tests should confirm use cases
Tests should prove something about your code that you think is worth proving.
Tests should confirm behavior that either is how the code currently works or how it should work.
Tests should help you write better code from the get-go.
Every bug that you need to fix is de-facto a use case that needs a test.

Code Coverage & Reviews

How do you know when there are “enough” automated tests?

Don’t get distracted by trying to achieve a specific coverage percentage. The most important thing is that the major use cases are covered.

If software is stable and there is “only” 40% test-coverage, then maybe there is a lot of code that rarely or never gets used? In that case, you might want to think about removing code that you don’t need rather than to waste time writing tests for code that never runs.

New code, though, should always have automated tests. A code reviewer should verify that new functionality is being tested.

Types of tests

Unit: Cover a single unit, mocking away other dependencies where needed.

Useful for verifying simple logic like calculated properties or verifying the results of service methods with given inputs.
Integration: Cover multiple units, possibly mocking unwanted dependencies

Useful for verifying behavior of units in composition, as they will be used in the end product. The goal is to cover as much as possible without resorting to more costly end-to-end tests
End-to-End: Also called UI Tests, these tests verify the entire stack for actual customer use cases

Very useful, but generally require more maintenance as they tend to be more fragile. Essential for verifying UI behavior not reflected in a programmatic model. Can work with snapshots (e.g. error label is in red)

Approach

The article Write tests. Not too many. Mostly integration. describes a pragmatic approach quite well. Instead of the classic “testing pyramid”, it suggests a “testing trophy”.

This style of development has the following aims:

Verify as much as possible statically, with linting and analyzers
Make integration tests cheaper because they prove more about your system than unit tests
Prove as much as possible outside of end-to-end tests because they’re expensive and brittle

Analysis

Remember that everything you use has to work both locally and in CI.

Static-checking

A project should include analyzers and techniques so that the compiler helps make many tests unnecessary. For example, if you know that a parameter or result can never be null, then you can avoid a whole slew of tests.

Developers should only spend time writing tests that verify semantic aspects that can’t be proven by the compiler.

Null-reference analysis in .NET

The .NET world provides many, many analyzers and tools to verify code quality. One of the most important things a project can do is to improve null-checking. The best way to do this is to upgrade to C# 8 or higher and enable null-reference analysis. The default language for .NET Framework is going to stay C# 7.3, but
you can enable null-reference analysis for .NET Framework quite easily.

Another option is to use the JetBrains Annotations NuGet package, which provides attributes to indicate whether parameters or results are nullable.

The preferred way, though, is to use the by-now standard nullability-checking available in .NET.

Doing neither is not a good option, as it will be very difficult to avoid null-reference exceptions.

Unit-testing

Unit tests are very useful for validating requirements and invariants about your code.

These are the easiest tests to write and will generally be the first ones that you will write.

A requirement or an invariant may be specified in the story itself, but it can be .anything that you know about the code that’s important. It’s up to the developer and the reviewer(s) to determine which tests are necessary. It gets easier with experience—and it doesn’t take long to get enough experience so that it’s no longer so intimidating.

Unit-testing example

Just as a quick example in .NET, consider the following code,

public bool IsDiagnosticModeRunning
{
    get => _isDiagnosticModeRunning;
    set
    {
        _isDiagnosticModeRunning = value;
        _statusManager.InstrumentState = value ? InstrumentState.DiagnosticMode : InstrumentState.Ready;
    }
}

Here we see a relatively simple property with a getter and a setter. However, we also see that there is an invariant in the implementation: that the _statusManager.InstrumentState is synced with it.

Using many of the techniques described below, we could write the following test:

[DataRow(true, InstrumentState.DiagnosticMode)]
[DataRow(false, InstrumentState.Ready)]
[TestMethod]
public void TestIsDiagnosticModeRunning(bool running, InstrumentState expectedInstrumentState)
{
    var locator = CreateLocator();
    var instrumentControlService = locator.GetInstance();
    var statusManager = locator.GetInstance();

    Assert.AreNotEqual(expectedInstrumentState, statusManager.InstrumentState);

    instrumentControlService.IsDiagnosticModeRunning = running;

    Assert.AreEqual(expectedInstrumentState, statusManager.InstrumentState);
}

Here, we’re using MSTest to create a parameterized test that,

creates the IOC
gets the two relevant services from it
Verifies that the state is not already set to the expected state (in which case the test would succeed even if the tested code doesn’t do anything)
Sets the property to a given value
Verifies that the state is correct for that value

We now have code that validates two facts about the system. Should something change where these facts are no longer true, the tests will fail, giving the developer a chance to analyze the situation.

Was the change inadvertent or deliberate?
Are the facts still correct? Does the test need to be updated?

If you’re addressing a bug-fix, though, you might be able to prove that you’ve fixed the bug with a unit test, but it’s also likely that you’ll have to write an integration test instead.

Integration-testing

Unit tests have their place, but they are far too emphasized in the testing pyramid. The testing pyramid comes from a time when writing integration tests was much more difficult than it (theoretically) is today.

The “theoretically” above means that the ability to write integration tests as efficiently as unit tests is contingent on a project offering proper tools and support.

One common complaint about integration tests vis à vis unit tests is that they run more slowly. Another is that they take longer to develop. Ideally, a project provides support to counteract both of these tendencies.

To this end, then, a project should offer base and support classes that make common integration tests easy to set up and quick to execute:

Interacting with a database
Setting up a known database schema
Getting to a clean dataset
Mocking the database
Mocking other external dependencies in a project (e.g. loading configuration from an endpoint, sending emails, sending modifications to endpoints)

Comparing Unit and Integration tests

A drawback to unit tests is that, while they can test an individual component well, it’s really the big picture that we want to test. We want to test scenarios that correspond to actual use cases rather than covering theoretical call stacks. It’s not that the second part isn’t important, but that it’s not as important.

Given limited time and resources, we would prefer to have integration tests that also cover a lot of the same code paths that we would have covered with unit tests, rather than to have unit tests, but few to no integration tests.

This, however, leads directly to…

The advantage of a unit test over an integration test is that when it fails, it’s obvious which code failed. An integration test, by its very nature, involves multiple components. When it fails, it might not be obvious which sub-component caused the error.

If you find that you have integration tests failing and it takes a while to figure out what went wrong, then that’s a sign that you should bolster your test suite with more unit tests.

Once an integration test fails and one or more unit tests fail, then you have the best of both worlds: you’ve been made aware that you’ve broken a use case (integration test), but you also know which precise behavior is no longer working as before (unit test).

Tools and Techniques

Tests are Code

Testing code is just as important as product code. Use all of the same techniques to improve code quality in testing code as your would in product code. Clean coding, good variable names, avoid copy/paste coding—all of it applies just as much to tests.

There are two main differences:

You don’t need to document tests
You don’t have to write tests for tests. :-)

Writing testable code

This is a big, big topic, of course. There are a few guidelines that make it easier to write tests—or to avoid having to write tests at all.

As noted above, code that can be validated by the compiler (static analysis) doesn’t need tests. E.g. you don’t have to write a test for how your code behaves when passed a null parameter if you just forbid it. Likewise, you don’t have to re-verify that types work as they should in statically typed languages. We can trust the compiler.

Here are a handful of tips.

Prefer composition to inheritance
A functional programming style is very testable
An IOC Container is very helpful
Avoid nullable properties, results, and parameters
Avoid mutable data
Interfaces are much easier to fake or mock; use those wherever you can
Generally, the SOLID (Wikpedia) principles are a decent guide

See the following articles for more ideas.

Parameterized Tests

Investigate your testing library to learn how to write multiple tests without having to write a lot of code. In the MSTests framework, you can use DataRow to parameterize a test. In NUnit, TestCase does the same thing, and Value allows you to provide parameter values for a list of tests that are the Cartesian product of all values.

Mocking/Faking

Use mocks or fakes to exclude a subsystem from a test. What would you want to exclude? While you will want to make some tests that include database access or REST API calls, there are a lot of tests where you’re proving a fact that doesn’t depend on these results.

Focus on what you’re testing

For example, suppose a component reads its configuration from the database by default. A test of that component may simply want to see how it reacts with a given input to a given method. Where the configuration came from is irrelevant to that particular test. In that case, you could mock away the component that loads the configuration from the database and instead use a fake object that just provides some standard values.

Test error conditions

Another possibility is to fake an external service to see how your code reacts when the service returns an error or an ambiguous response. Without mocks, how would you test how your code reacts when a REST endpoint returns 503 or 404? Without a mock, how would you force the purely external endpoint to give a certain code? You really can’t. With a mock, though, you can replace the service and return a 404 response for a specific test. This is quite a powerful technique.

How to fake?

As noted above, it’s much, much easier to use fake objects if you’ve consistently used interfaces. You can just create your own implementation of the interface whose standard implementation you want to replace, give it a fake implementation (e.g. returning false and empty string and null for methods and properties) and then use that class as the implementation.

Faking/mocking libraries

If you have interfaces that perform a single task (single-responsibility principle), then it doesn’t take too much effort to write the fake object by hand. However, it’s much easier to use a library to create fake objects—and there are other benefits as well, like tracking which methods were called with which parameters. You can assert on this data collected by the fake object.

For .NET, a great library for faking objects is FakeItEasy.

With a fake object, you can indicate which values to return for a given set of parameters without too much effort. Similarly, you can use the same API to query how often these methods have been called. This allows you to verify, for example, that a call to a REST service would have been made. This is a powerful way of proving facts about your code without having to actually interact with external services.

An example

The following code configures a fake object for ITestUnitConfigurationService that returns default data for all properties, except for Configuration and GetTestUnitParameterValues(), which are configured to return specific data.

private static ITestUnitConfigurationService CreateFakeTestUnitConfigurationService()
{
    var result = A.Fake();

    var testUnitParameters = CreateTestUnitParameters();
    var testUnitConfiguration = new TestUnitConfiguration(testUnitParameters);

    A.CallTo(() => result.Configuration).Returns(testUnitConfiguration);

    var testUnitParameterValues = CreateTestUnitParameterValues();

    A.CallTo(() => result.GetTestUnitParameterValues()).Returns(testUnitParameterValues);

    return result;
}

In the test, we could get this fake object back out of the IOC (for example) and then verify that certain methods have been called the expected number of times.

var testUnitConfigurationService = locator.GetInstance();

A.CallTo(() => testUnitConfigurationService.Configuration).MustHaveHappenedOnceExactly();
A.CallTo(() => testUnitConfigurationService.GetTestUnitParameterValues()).MustHaveHappenedOnceExactly();

Snapshot-testing

You can avoid writing a ton of assertions and a ton of tests with snapshot testing.

For example, imagine you have a test that generates a particular view model. You want to verify 30 different parts of this complex model.

You could navigate the data structure, asserting the 30 values individually.

That would be pretty tedious, though, and lead to fragile and hard-to-maintain testing code.

Instead, you could emit that structure as text and save it as a snapshot in the repository. If a future code change leads to a different snapshot, the test fails and the developer that caused the failure would have to approve the new snapshot (if it’s an expected or innocuous change) or fix the code (if it was inadvertent and wrong).

The upside is that large swaths of assertions are reduced to a simple snapshot assertion. The downside is that the test might break more often for spurious reasons. Generally, you can avoid these spurious reasons by being judicious about how your format the snapshot,

Avoid timestamps or data that changes over time
Avoid using output methods that are too likely to change over time

See the documentation for the Snapshooter NuGet package.

End-to-end Testing

There have been many solutions to the problem of automated testing of web UIs over the years. The one many know is Selenium, but tools like Cypress, TestCafe, Puppeteer and Playwright have largely replaced it. The WebdriverIO library

Before choosing a tool, you’ll want to consider what your requirements are:

Tests should run quickly
Headless/command-line support for integrating into CI builds
A GUI for running tests is a plus
Traceability of tests
Snapshot-testing
Debugging, including rewinding through the UI events

The current front-runner for end-to-end testing is Playwright, an open-source cross-browser, cross-platform, cross-language testing framework.

Video: What’s new in Playwright 1.32 shows the new _UI Mode_ in action (see the release notes; screenshot below)
!image.png
Video: “Playwright can do this?” — Microsoft meetup March 2023 (see masking for visual regression at 00:18:00)
GitHub
Example

Planner / Executor Pattern

This pattern is particularly useful when you have a bunch of steps to execute. Instead of executing the steps as you go, you build a plan that describes how those steps would be executed and return that as the result of the planner phase. You can test this plan very easily without worrying about how to mock away the mutating part of the code.

For example, suppose you want to sync an online data source with a local configuration. The classic way would be to do something like the following:

var items = GetItemsFromServer();
foreach (var item in items)
{
  var itemData = GetItemDataFromServer(item);
  if (string.IsNullOrEmpty(itemData.Text))
  {
    SetStandardText(item, itemData);
    SaveItemToServer(item);
  }
}

With so little logic, there’s really no way to question this setup, is there? But think about what happens if there are more decisions to make, more data to retrieve, more data to update on the server. As this logic increases in complexity, the mutating code becomes ever more deeply embedded in read-only logic. That read-only logic ends up being the lion’s share of the code that you want to test, but you have to step very lightly to avoid making changes on the server. You can, of course, mock away services, to make sure that nothing is communicated back to the server, but there is another way.

What if you were to consider the set of operations as phases?

A planning phase where the program gathers all of the information that it needs to determine which commands to execute in order to “repair” the situation.
A much shorter and simpler execution phase where the program loops over the plan and applies it.

This approach has several advantages:

There are fewer questions about how to handle exceptions that occur while applying the plan. You don’t have to worry about what happens when a mutation occurs deep within the planning logic.
It’s easier to test the meat of the logic because the output is a plan that you can snapshot or otherwise verify.
You have the user-friendly option to present the user with a detailed plan of what will happen before applying any changes.
You can even store the plan to execute later, e.g., after it has been audited by a separate team.

Once again, we have a pattern that not only makes testing easier, but it makes the entire architecture more robust, opening up possibilities that you wouldn’t have with the straightforward pattern (which would be harder to test).

To finish up this section, let’s take a quick look what that could look like in pseudocode.

var items = GetItemsFromServer();
var commands = new Commands();
foreach (var item in items)
{
  var itemData = GetItemDataFromServer(item);
  if (string.IsNullOrEmpty(itemData.Text))
  {
    var command = CreateCommand(
      "Set standard text for {item}", 
      () => {
        SetStandardText(item, itemData);
        SaveItemToServer(item);
      }
    ) 
  }
}

// Present commands to the user; store the commands for later, or execute them…
// This is where tests would verify the commands generated from a given set of 
// item data.

foreach (var command in commands)
{
  try
  {
    command.Apply();
  }
  catch
  {
    // Log error and continue?
  }
}

Instead of executing the command immediately, we store what we would want to do with a closure and a description. We can do whatever we want with those commands; executing this is one option, but you can see how useful it would also be for verifying that the logic is correct in tests.

Hiding folders in Azure DevOps Code Wikis

2023-01-18T10:09:22+01:00

Published by marco on 18. Jan 2023 10:09:22 (GMT-5)

It is currently not possible to hide individual folders or files in an Azure DevOps Code Wiki. Folders and files beginning with a . are hidden by default, but you can’t influence the structure other than by reordering pages with a .order file in an individual folder.

The topic Hide folders that do not contain Markdown files (Microsoft Developer Community) discusses extending this functionality.

I replied with the following:

There are a lot of good suggestions here.

Changing the name of the folder or file in order to hide it (e.g., by prepending the name with .) is not a practical solution. Wikis based on, e.g., .NET solutions cannot just change the names of folders that would be empty in the Wiki.

Although I think that hiding empty folders by default seems like a good idea, I also understand that clicking an empty folder shows the UI that allows a user to create a page for an empty folder, so hiding that folder would also remove functionality from the online UI.

I think that many code-based Wikis wouldn’t mind losing this functionality, but we probably need a top-level Code Wiki option here where you can decide whether to show or hide empty folders by default.

That takes care of the default behavior, which would cover a lot of use cases for “cleaning up” the wiki’s structure.

However, if you elect not to hide folders by default, or if you just want to hide another file or folder, how can we support that requirement? I would suggest two mechanisms:

As suggested above, a .wikiignore file that allows globbing à la Git would be powerful (e.g., it would allow you to ignore all Properties folders in all project folders in .NET solutions).
We could extend the .order file to support !, which would hide the folder or file from being displayed. This feature would technically also cover all use cases covered by a .wikiignore file, but would involve quite a bit more work to support (i.e., you would have to add a .order file to every Properties folder instead of just configuring once, in a root file).

Stop trying so hard to use pattern-matching

2023-01-15T11:10:38+01:00

Published by marco on 15. Jan 2023 11:10:38 (GMT-5)

In the article Why tuples in C# are not always a code smell by Dennis Frühauff (dateo. Coding Blog), the author writes the following code for calculating a discount.

The requirements are as follows:

Premium customers get 20% off.

Gold customers get 30% off.

Regular customers, when they are students (< 25 years), get 10% off.

Regular adult customers get no discount.

All regular customers get 15% off during happy hour (3 to 8 p.m.).

The author’s original version

public decimal CalculateDiscount(Customer customer, DateTime time)
{
    if (customer.CustomerType == CustomerType.Gold)
    {
        return 0.3m;
    }
    else if (customer.CustomerType == CustomerType.Premium)
    {
        return 0.20m;
    }
    else
    {
        if (time.Hour is > 15 and < 20)
        {
            return 0.15m;
        }

        if (customer.Age < 25)
        {
            return 0.1m;
        }
        else
        {
            return 0m;
        }
    }

    return 0m;
}

He doesn’t like this code, and neither do I. But we have different reasons.

The author’s pattern-matching version

The author rewrites the code above with pattern-matching, to make it “pretty much look like the business rules stated above”.

His final version looks like this:

public decimal CalculateDiscount(Customer customer, DateTime time)
{
    return (IsStudent(customer), IsHappyHour(time), customer.CustomerType) switch
    {
        (_,         _, CustomerType.Gold)    => 0.3m,
        (_,         _, CustomerType.Premium) => 0.2m,
        (_,      true, CustomerType.Regular) => 0.15m,
        (true,  false, CustomerType.Regular) => 0.10m,
        (false, false, CustomerType.Regular) => 0.0m
    };
}

public bool IsStudent(Customer customer) => customer.Age < 25;
public bool IsHappyHour(DateTime datetime) => datetime.Hour is > 15 and < 20;

I strongly disagree that this looks like the original business requirements. In order to figure out who gets a 15% discount, you have to figure out what the first two boolean fields of the tuple indicate, so you look at the ad-hoc-instantiated tuple (which is created only in order to pattern-match on it), where you can see from the local-method names that they indicate whether the customer is a student and whether the sale was made during happy hour, respectively.

I have a few issues with this version;

As noted above, it is not easily legible
I am not sure about the allocation or efficiency of this code
The extra formatting required (aligning the _ placeholders) makes it look difficult to maintain

Cleaning up the original without pattern-matching

I would tackle this differently, and with classic means. First of all, my main problem with the original version is that it’s made unnecessarily long and cluttered by including else statements after returns. Get rid of those and you’ll get rid of indenting and all of a sudden, the original code looks remarkably legible. It’s also 100% clear that there are no allocations and we don’t have to worry our pretty heads about the efficiency of code generated for either if and return statements or for simple comparisons.

public decimal CalculateDiscount(Customer customer, DateTime time)
{
    if (customer.CustomerType == CustomerType.Gold)
    {
        return 0.3m;
    }
  
    if (customer.CustomerType == CustomerType.Premium)
    {
        return 0.20m;
    }

    if (time.Hour is > 15 and < 20)
    {
       return 0.15m;
    }

    if (customer.Age < 25)
    {
       return 0.1m;
    }

    return 0m;
}

Improving semantics

How much clearer would you like that to be? I suppose we could add some local methods to add some semantics to the comparisons.

public decimal CalculateDiscount(Customer customer, DateTime time)
{
    if (IsLevel(CustomerType.Gold))
    {
        return 0.3m;
    }
  
    if (IsLevel(CustomerType.Premium))
    {
        return 0.20m;
    }

    if (IsHappyHour())
    {
       return 0.15m;
    }

    if (IsStudent())
    {
       return 0.1m;
    }

    return 0m;

    bool IsLevel(CustomerType customerType) => customer.CustomerType == customerType;
    bool IsStudent() => customer.Age < 25;
    bool IsHappyHour() => time.Hour is > 15 and < 20;
}

To make up for the fact that we lost all of that delicious pattern-matching and those tuples from the author’s version, we’re using local methods. Is this an improvement? Overall, I think so. The first version was already pretty good, but now we’ve improved the semantics by taking the guesswork out of the magic numbers. The IsHappyHour method is definitely an improvement. The IsStudent also imparts more knowledge about what the magic age of 25 means. Also, we’ve managed to separate the calculation of the rebate from the determination of the conditions that affect the rebate.

Pattern-matching: take two?

Can we do anything with pattern-matching, though? Can we use pattern-matching in a way that’s more legible than the version proposed by the author?

What about this?

public static decimal CalculateDiscount(this Customer customer, DateTime time)
{
    return (customer, time) switch
    {
        ({ CustomerType: CustomerType.Gold }, _) => 0.3m,
        ({ CustomerType: CustomerType.Premium }, _) => 0.2m,
        (_, { Hour: > 15 and < 20}) => 0.15m,
        ({ Age: < 25 }, _) => 0.1m,
        _ => 0m
    };
}

OK. That’s not as bad as the author’s version. It doesn’t allocate a tuple just to be able to use a tuple, for starters. But is it more legible than the previous version? Not at all. We could, of course, improve the formatting to align all of the return statements, but that’s also no fun to maintain.

The real issue with the pattern-matching solution is that we can no longer use local functions to improve semantics. The only thing we could do would be to add an IsStudent property directly to the class (extension properties are still being discussed (GitHub)). We cannot improve the semantics of the pattern-matching on DateTime because that type is not under our control.

In conclusion, as with anything else in programming, you should be judicious in where you use the new and shiny features, always considering whether they’re actually helping improve your code.

You should be using a GUI for Git

2023-01-11T21:21:28+01:00

Published by marco on 11. Jan 2023 21:21:28 (GMT-5)

I’ve seen this Noob question: Does anyone use things like git gui? by Collekt (Reddit) again and again.

“Just curious as I’m learning and getting familiar with git. Do real production teams use any kind of tools for git like “git gui” or others? Or does everyone just use it from command line? Thanks for any insight. :)”

You almost certainly have several use cases for your source control:

clone/push/pull
commit
amend/squash/rebase interactive
merge
diff
code forensics (log/blame, cross-reference, find changes)
work with submodules

The command-line isn’t the most efficient or least error-prone for any of these tasks.

For example—something you do every day—a good GUI client will let you very quickly navigate diffs in your working tree with only a few arrow-key presses. You can’t beat that with the command line.

And, once you have to merge … you’ll want a more powerful view on things than you’re going to get from command-line tools. Of course, it’s possible to merge on the command-line! I’m just saying it’s more error-prone and not as efficient—especially for most developers. There are probably a couple of John Henrys out there, but c’mon.

It’s great that the command-line exists! It allows us to build UIs on top of it. It allows us to integrate anything we’d like into a headless process like CI/CD.

However, you’re going to be more efficient with a good GUI. There are pros/cons to the various UIs. I’ve landed quite firmly on SmartGit after an evaluation of all of the other tools (in no particular order: Tower, VS, VSCode, GitLens, Kraken, GitExtensions, GitHub Desktop, SourceTree, Git GUI).

Why an external rather than an integrated Git client?

Uniformity regardless of IDE
Hotkeys are more intuitive (in-IDE source-control tends to end up with strange hotkeys)
Ability to integrate a good merging tool (e.g. BeyondCompare)
etc.

Why an integrated rather than external Git client?

inline change markers
inline history/blame
etc.

You can use both, of course! Use whatever helps you be more accurate and efficient and happy.

Visual Studio Code’s default source control is very limited (no code forensics to speak of), so be careful of defaulting to that one. Visual Studio is getting better all the time, though. Still feels a bit weird for me, but it’s 10x better than it was a couple of versions ago.

Of course, YMMV, but please don’t continue to believe in the myth that using a command line is somehow a requirement to being a “real” developer. Developers who only use the command line are probably wasting time, probably making mistakes they shouldn’t, almost certainly missing out on powerful enhancements to their workflow.

Terminology for CSS values

2022-12-11T22:53:38+01:00

Published by marco on 11. Dec 2022 22:53:38 (GMT-5)

The article ”Thousand” Values of CSS by Karl Dubost (Otsukare) clarifies the definitions for the various types of value in CSS. While there aren’t a thousand different kinds of value in CSS, there are quite a few. Each has its raison d’être.

The article is informative, but lists the values in what I consider to be an unintuitive order. I’ve changed the order and consolidated a bit. Each term links to the W3C documentation [1] and each definition starts with the official description, a layman’s translation, and a simple code example.

Click to jump to the definition or read them in order to learn how they build on each other.

Initial value
Declared value
Cascaded value
Specified value
Computed value
Used value
Actual value
Resolved value

Initial value (W3C)

“Each property has an initial value, defined in the property’s definition table. ”

I.e. the initial value could also be called the default value, as defined in the specification.

p {
  /* the initial value of color is black */
}

Declared value (W3C)

“Each property declaration applied to an element contributes a declared value for that property associated with the element.”

I.e. the declared value is the one that you’ve directly assigned to a property in a CSS element.

p {
  color: red; /* declared value is red */
}

Cascaded value (W3C)

“The cascaded value represents the result of the cascade: it is the declared value that wins the cascade (is sorted first in the output of the cascade). If the output of the cascade is an empty list, there is no cascaded value.”

I.e. the cascaded value is the declared value that sorts first in the list generated by the cascade of declared values that apply to that element.

p {
  color: red; /* declared value is red */
}

p {
  color: green; /* declared and cascaded value is green */
}

Specified value (W3C)

“The specified value is the value of a given property that the style sheet authors intended for that element. It is the result of putting the cascaded value through the defaulting processes, guaranteeing that a specified value exists for every property on every element.”

I.e., the specified value is the cascaded value, or the default value for that property, if there are no cascaded values.

p {
  color: red; /* declared value is red */
}

p {
  color: green; /* declared, cascaded, and selected value is red. */

  /* Also, the selected value for, e.g., margin-left is 0
     because that's the default, and no value was specified.  */
}

Computed value (W3C)

“The computed value is the result of resolving the specified value as defined in the “Computed Value” line of the property definition table, generally absolutizing it in preparation for inheritance.”

I.e., the computed value is the specified value, but converted to absolute units (e.g., 2em converts to 32px if the font-size is 16px), or to a special value like auto.

html {
  font-size: 16px;
}

p {
  font-size; 2em 

  /* declared, cascaded, and selected value are 2em,
     but computed value is 32px. */

  /* computed value of width is auto because there is no declared
     value, so the selected value is the initial value. */
}

Used value (W3C)

“The used value is the result of taking the computed value and completing any remaining calculations to make it the absolute theoretical value used in the formatting of the document.”

I.e., the used value is the computed value, but special values are converted based on context. E.g., a computed value of width: auto will have a used value of width: 100px if the parent container is 100px wide.

body {
  width: 100px;
}

p {
  width; auto 

  /* declared, cascaded, selected, and computed value are 2em,
     but used value is 100px. */
}

Actual value (W3C)

“A used value is in principle ready to be used, but a user agent may not be able to make use of the value in a given environment. For example, a user agent may only be able to render borders with integer pixel widths and may therefore have to approximate the used width. Also, the font size of an element may need adjustment based on the availability of fonts or the value of the font-size-adjust property. The actual value is the used value after any such adjustments have been made.”

I.e., the actual value is the used value, but adjusted as necessary for the output device.

p {
  border-width: 1.1px; 

  /* declared, cascaded, selected, computed, and used value 
     are 1.1px, but actual value is 1px. */
}

Resolved value (W3C)

Despited the name, the value returned by the getComputedStyle() method will be either the computed or the used value, depending on the type of property. The result of this method is called the resolved value.

body {
  width: 100px;
}

p {
  width; auto
}

const p = document.querySelector('p')[0];
const resolvedValue = window.getComputedStyle(p).width;

/* resolvedValue == 100px */

[1] The W3C documentation lists the terms in the intuitive order, but is quite extensive and technical. The summary in this article is, I hope, easier to understand.

How to evaluate dependencies

2022-12-04T22:11:39+01:00

Published by marco on 4. Dec 2022 22:11:39 (GMT-5)

As software developers, we are constantly making the decision between make or buy.

Deciding to make something carries with it the obligation to design, develop, test, document, and support it. You’ll have everything under your control, but you’ll also have to do everything yourself.

If a component is not part of your project’s core functionality, then it’s often a good idea to look around and see if you can find someone who’s already built that functionality. Optimally, the component you find will be free and open-source and will have been built by a team whose aim was to provide exactly that functionality.

Because they’ve focused on their task, it’s more likely to be a robust solution to your problem that what you would write yourself (focused, as you hopefully are, on your task). Their solution might go a bit too far (see “Size/Focus”), but that might be fine too (see “Extensibility”).

Is the component good, though? What do we mean by “good”? How can we tell? How do we go about sizing up a dependency?

Facets

The following table outlines various facets to consider.

Legal

License: See the Uster OSSPolicy for compliance information
Cost: Free? One-time fee? Per-seat license?

Organizational

Maturity: How long has the product been around?
Activity: When was the last commit? The last release?
Maintenance Status: Is the project actively maintained? How long is the issue list? Are bugs addressed?
Popularity: How many stars? Is it widely used?
Community: Do questions get answered? Is there help on StackOverflow?
Reputation: Are there known issues with the product or maintainers?

Technical

Documentation: Is it sufficient? Are there good examples or tutorials?
Configuration: Can you just include the package? How is the configuration? Does it follow platform standards?
Size / Focus: Does it do one thing well? Or many other things you don’t need?
Extensibility: How easy is it to extend the package for additional use cases? Will that matter to your project?
Efficiency / Performance: For this you have to know your non-functional requirement
Portability: Does it work on all target platforms and run-times? Are there unreasonable restrictions?
Transitive Dependencies: What are its dependencies? Are those reasonable?
Quality: What sort of impression does the project make overall? How does the code look?

References

A Dependency Checklist by Justin Howard in April 2021
How to Choose the Right Dependencies for Your Project by Jamie Bullock in January 2020

C# 11 Features

2022-11-21T22:48:57+01:00

Published by marco on 21. Nov 2022 22:48:57 (GMT-5)

Updated by marco on 7. Dec 2022 22:47:43 (GMT-5)

The articles Twelve C# 11 Features by Oleg Kyrylchuk and Welcome to C# 11 by Mads Torgersen (Microsoft .NET Blog) provide an excellent overview with examples of new features in C# 11, available with .NET 7.0.

I include my own notes below.

Interesting and obviously useful

“Obvious” to me, at least. The terms link to examples in one of the articles linked above.

Native UTF-8 Strings

You can now append u8 to the end of a literal string to make it UTF-8 instead of the system-standard UTF-16. For example, “Test string”u8 will be encoded by the compiler as UTF-8 and will have the type ReadOnlySpan.

Raw Strings (Here-Doc)

C# finally supports “here documents” (which have been supported in other languages like Perl or PHP for a long time). In C#, they’re called raw string literals and,

they begin and end with at least three double-quotes
can be multi-line
can contain unescaped everything (unless you have three double-quotes in a row, in which case, you just add more double quotes to the fences at the beginning and end)
support interpolation
and also automatically trim left indenting.

Finally, you can just pass a formatted and indented JSON into C# code, interpolate some variables, and do it all without escaping anything! [1]

Abstracting over static members

“In fact .NET 7 comes with a new namespace System.Numerics chock-full of math interfaces, representing the different combinations of operators and other static members that you’d ever want to use. […] All the numeric types in .NET now implement these new interfaces – and you can add them for your own types too! So it’s now easy to write numeric algorithms once and for all – abstracted from the concrete types they work on – instead of having forests of overloads containing essentially the same code.”

See here for an example of using generic parameters in operators, or Generic Math for an example that uses some of the new interfaces, like IAdditionOperators and ISubtractionOperators.

In that vein, there are a lot more interfaces that support generalized computation, like ISpanParsable Interface, which “[d]efines a mechanism for parsing a span of characters to a value.”

Required members

“Another ongoing theme that we’ve been working on for several releases is improving object creation and initialization. C# 11 continues these improvements with required members.”

Generic Attributes

You can now make attributes generic and use a generic constraint to limit which types may be passed as type parameters (enforced by the compiler, rather than at runtime). E.g. [Generic] declared an attribute of type GenericAttribute parametrized with MyType.

Extended nameof Scope

This seems like a small one, but it’s a welcome improvement. You can now use nameof with “method parameter[s] in an attribute on the method or parameter declaration.”

StringSyntaxAttribute

[Added on 03.12.2022] This one is not technically part of C#—it’s actually included in .NET 7—but it’s worth an honorable mention. You can now decorate a parameter to indicate the string-syntax that it supports. This allows IDEs to provide string-syntax-specific code-completion, highlighting, and error-handling. A good example is, of course, for regular expression patterns. While Rider and ReSharper have provided this support for certain constructors and methods (e.g. RegEx or DateTime.Format), this is a welcome standardization that gives your own APIs the same star treatment. The post What does the StringSyntaxAttribute do? includes a list of the syntaxes supported out-of-the-box. The post StringSyntaxAttribute for syntax highlighting provides examples and screenshots.

Niche Additions

A few that seem a bit dubious, but are, I guess, welcome additions, and will be useful to someone are,

List patterns: You can do some wild matching with these (i.e. numbers is [_, >= 2, _, _] returns true if numbers is a four-element list where the second element is greater than or equal to 2.
Newslines in string-interpolation patterns: I guess it’s nice that you can format complex variables inside an interpolated string, but I still think that you should just make a local variable instead. That would be more readable, in any case.
Auto-default Structs: This will allow you to define structs without being so pedantic about defining the constructor.
Unsigned Right-shift Operator: I know I’m almost certainly not going to use this one, but it nicely rounds out the support offered with the new System.Numerics interfaces and the increased generality offered by abstracting over static members (linked above).
File-scoped types: This seems kind of like an analog to unexported types declared in TypeScript, but I don’t really see myself using them very much until we get the type declaration from TypeScript as well.
Source-generated regular expressions: This feature leverages the source-generation that’s been available since .NET 5 to avoid JIT for regular expressions by generating code for it directly. It’s really great to see the .NET team getting mileage out of the features they’re adding (I’m sure this isn’t a coincidence).

For another example of source-generation, see Generating PInvoke code for Win32 apis using a Source Generator by Gérald Barré (Meziantou's Blog), which explains how to use Microsoft’s NuGet package Microsoft.Windows.CsWin32 to easily generate source for any Win32 API or type—no more writing this stuff manually!

[1]

Check out the following animation of converting an escaped string to a raw string in Rider (from the post Rider 2022.3: Support for .NET 7 SDK, the Latest From C#11, Major Performance Improvements, and More! by Sasha Ivanova (The .NET Tools Blog):

Waterfall vs. Agile vs. "Wagile"

2022-09-20T21:45:20+02:00

Published by marco on 20. Sep 2022 21:45:20 (GMT-5)

The article Agile Projects Have Become Waterfall Projects With Sprints by Ben Hosking (ITNext) argues that a lot of projects using agile aren’t agile at all, but are “more like waterfall projects with upfront requirements, fixed deadlines, sprints and 2 weekly demos.”

Overall, I understand where the author is coming from, but I found the tone pretty overwhelmingly negative. I can only imagine what the author has seen to have put them in such a dark place. 😐

I thought that this was an interesting comment in the article:

“You cannot create fixed deadlines unless you know all the requirements and guarantee no requirements are changed.”

However, you can create fixed deadlines (the world kind of expects them sometimes, e.g. when you’re preparing for a conference that happens on a specific day), but then you have to be willing to adjust on what will be delivered on that day.

Agile started out in a world where a partial product could be delivered and still have value. That is not the case with all projects. Thus, the designations MVP (Minimum Viable Product) and MMP (Minimum Marketable Product).

Even agile projects have to be honest about what the minimum time frame is for an MVP, though. Where some projects have an advantage is that they can iterate in smaller increments after that, but also can deliver useful, though nonviable pieces as artifacts of iterations. There are some projects where it’s more difficult to carve out such deliverables.

Although there is always work that has been planned and successfully accomplished and documented, it’s sometimes hard to measure or see progress until a larger amount of work has been done. I suppose that’s the art of planning and measuring.

Here, it’s also useful for technical team members or more technically oriented teams to learn how to consider administrative, planning, design, and documentation work as just as useful as producing technical artifacts (be they physical or virtual).

A waterfall process doesn’t help figure out what to do when the delivery cannot be completed on time. It (generally) has no plan for what to drop if you can’t deliver on time. Also, it doesn’t really have any ideas for what to do when new things “crop up”. An agile process is supposed to help you triangulate toward a version of the product that can actually be delivered by the target date—or help you better (and sooner) predict whether it’s even possible to deliver anything useful by that date.

I think you have to be honest about which projects really can be run in an agile way—but then also make sure that they take advantage of agility to be bolder than they have been.

Release early, release often, think about what your MVP is, all of those things are good to take from the agile process. As far as the “ceremony” of the process goes: I have always found value in the review and retros.

Applying the rule of least power in practice

2022-09-02T04:28:25+02:00

Published by marco on 2. Sep 2022 04:28:25 (GMT-5)

Some asked is there a js library that animates the text word by word like shown? by DemDavors (Reddit).

A bunch of people answered “just do it with CSS!” and one or two recommend using GSAP (Green Sock Animation Platform). I’d just heard about that library in the following instructive video and had had a chance to investigate how it works.

FLIP animations | HTTP 203 by Google Chrome Developers

I’d like to expand on the comments recommending to use the “rule of least power”. They are absolutely correct, but you have to consider the entire task:

Learning how to use CSS for this task
Writing maintainable CSS to execute this task

For those who already know how to do this and are trying to limit JS as much as possible then, by all means, use CSS only.

For anyone else, “least power” means using CSS where possible, but not necessarily excluding JS if doing so improves maintainability, enhances developer speed and accuracy, and reduces errors.

If you look at what GSAP does, it generally maps a high-level JS animation API to CSS animations and transitions. The concession you’ve made is to include animations using a relatively thin layer of JavaScript. That thin layer, though, is a change in technology (more power), which ensures that the animations will no longer work if JavaScript is disabled. However, you’re actually using CSS animations under the hood, benefiting from the high-level and highly optimized implementations in the browser. So you’ve lost flexibility as far as user agents is concerned, but the performance is the same, and you’ve probably saved time debugging and tweaking the implementation.

That might be a better balance for those developers who would have no idea how to animate the given example with native CSS. If they did that, they would have to first learn how to do it, taking up a lot of time, to say nothing of that they might end up creating a suboptimal implementation, both performance- and maintenance-wise.

Telling someone to “just use CSS” is technically correct, but also sounds a lot like answering “just use pipes” when someone asks how to install a toilet. There’s a bit of detail missing there.

New feature for C#: Anchored types

2022-09-02T04:27:24+02:00

Published by marco on 2. Sep 2022 04:27:24 (GMT-5)

I recently answered the question What features from other languages would you like to see in C#? by BatteriVolttas (Reddit)

I think Anchored Declarations and Qualified Anchored Declarations from Eiffel would be very useful.

I like the name “anchored” because you’re anchoring the type of one thing to another. Instead of using int throughout a class, you can just make e.g. a field named _id be an int and then make all other types (e.g. for the parameter passed to a method) refer to the anchor with like _id or typeof _id.

If the type of the field ever needs to change, you only need to update one place. It’s more expressive because the alternative is to explicitly write the type of the parameter, whereas that was never what was going on. The method doesn’t decide what the type is; we’re just used to _syncing_ it to the type of the field _manually_ because there is no way to express the relationship in most languages we’re using.

Here’s an example:

class A
{
  int Status { get; set; } = 0;
  like Status PriorStatus { get; }

  void Start(like Status s) {}
  void Stop(like Status s) {}
}

The syntax is similar to how ref and out work now, but looking at it takes a bit of getting used to, especially for the property declaration.

TypeScript has this feature, with the typeof operator, but they don’t name it. TypeScript has two advantages here: it places the type after the variable name, which feels a bit more natural when the type is expressed with multiple words, and TypeScript has implicit return types, so you don’t have to write the type at all in many cases.

Because of the implicit typing, TypeScript has technically had anchored types all along!

class A
{
    status: int = 0;

    // The implicit type here is derived from "status",
    // which "anchors" the type of the function to that field.
    get priorStatus() { return status; }

    // Here we're obliged to restrain the type explicitly
    void Start(s: typeof status) {}
    void Stop(s: typeof status) {}
}

As of TypeScript 4.7, it supports qualified anchored declarations on private fields as well.

Someone suggested in a response that generics might fill this bill already.

In a way, yes, that’s true. I could define the whole class with a generic type argument and then create a derived type that fixes the type argument to int.

class A
  where TStatus : INumber
{
  TStatus Status { get; set; } = TStatus.Zero;

  TStatus PriorStatus { get; }

  void Start(TStatus s) {}
  void Stop(TStatus s) {}
}

class IntA : A {}

We have to use the newest features from C# 11 in order to be able to initialize the value to 0. If it were a value that maps to a non-mathematical concept (e.g. additive or multiplicative identity), then we wouldn’t be able to use the generic approach.

It feels a bit like misuse of generics, though, when I just wanted a shorthand for letting one type reference another. As I wrote, TypeScript already allows this and seems to have found it a useful addition to generics (you can probably implement it under-the-hood with the same code in the compiler).

I feel the same way about the missing type declaration from TypeScript (or the very similar, but less powerful typedef from C or Pascal).

Quick CSS: text-decoration

2022-03-21T22:50:55+01:00

Published by marco on 21. Mar 2022 22:50:55 (GMT-5)

The article When to Avoid the text-decoration Shorthand Property by Šime Vidas (CSS Tricks) makes a couple of interesting points. Basically, you have a lot of control over how underlines are drawn on text.

Modern browsers allow you to tweak the text-decoration: underline (MDN) with text-decoration-thickness (MDN) and text-underline-offset (MDN)
Use the :any-link (MDN) to select links that actually have an href attribute rather than selecting all links.
The CSS property text-decoration is a shorthand property, which means that setting it overwrites all of the properties that it might represent (including the underline thickness). (MDN)

The article doesn’t mention these, but,

text-decoration-skip (MDN) controls how to underline whitespace
text-decoration-skip-ink (MDN) controls whether a text decoration (underline or overline) can touch the ascenders or descenders of glyphs.

The following text has the style text-decoration: underline .4em; text-underline-offset: .4em. Note that it doesn’t affect the bounding box.

squabbling

The following text has the style text-decoration: underline; text-decoration-skip: spaces; text-decoration-skip-ink: all. Note that text-decoration-skip only works with Safari at the time of writing.

squabbling with boggling

The "Hustle" culture in Software Development

2022-01-24T17:20:05+01:00

Published by marco on 24. Jan 2022 17:20:05 (GMT-5)

Have you noticed that there is more and more content available to help you learn how to program? For every topic under the sun, there seems to be a blog article or video of superficially reasonable quality. For every question on StackOverflow, there’s an effusive answer with examples.

This is all pretty great, honestly.

However, with the increase in content. there is also the need to be able to wade through it.

How old is that StackOverflow answer? How appropriate is the answer to your particular question? Are there other solutions? Maybe easier ones? Maybe more modern ones? Has this solution to this particular problem been addressed in more recent versions? This isn’t new, of course. You should have been asking yourself questions like this for quite a while with these so-called expert-community sites.

However, now, we’re also inundated with content from people hustling to make a living as professional, freelance, advice-givers online. This is not a bad thing, necessarily. It’s great that the unsung masters that formerly only provided value inside of a single company are bringing their didactic abilities to the world. That’s not all that they’re doing, though.

Those who are on a subscriber model have to publish content in order to keep their subscribers. They don’t even necessarily have to produce anything of lasting value—they just have to produce something. They just have to retain and/or grow their subscriber base. This leads to nice-looking, but ultimately useless “fluff” content that rehashes an old concept with a few flashy graphics or an accompanying video. And the videos! Many of them take 15 minutes to explain a concept that you could describe adequately in a paragraph and a code example.

The Microsoft MVP bloggers are very conspicuous these days: there are many who are publishing an article or two per week “explaining” a C# 10 feature that has already been explained to death in dozens of other high-profile articles—to say nothing of the article Welcome to C# 10 by Kathleen Dollard (Microsoft Dev Blogs), which comes straight from the horse’s mouth, is wonderfully written, and, honestly, says all there needs to be said about these features.

But, if you search for “C# 10”, there is a flood of repetitive and, sometimes, outdated, information on C# 10. And these authors are all still churning out the articles. They’re doing it for the clicks, for the ad-views, for the subscribers. It’s a living. I get it. But, overall, it contributes to a very muddled picture that makes it difficult for people looking for advice and assistance.

CSS Speedrun

2022-01-22T12:22:44+01:00

Published by marco on 22. Jan 2022 12:22:44 (GMT-5)

If you want to test or hone your CSS skills, check out the CSS Speedrun. It lets you warm up with a relatively easy “intro”, then takes you through ten levels. Generally, each level tests a different feature of CSS (usually a specific selector). The final question (pictured) makes you combine what you’ve learned or used from other levels.

The image below is from my second time through. The first time through I needed about nine minutes; the next morning, I got through much more quickly. I guess I’d learned something. 🎉 for me.

CSS Speedrun results (second time through)

TIL: nth-of-type() and na+b in CSS

2022-01-21T11:12:46+01:00

Published by marco on 21. Jan 2022 11:12:46 (GMT-5)

I’ve known about nth-child(n) for a long time. It selects the nth child from a structure if that child happens to match the given tag. You can always select the nth child by omitting the tag.

For example, div :nth-child(2) (two selectors) will match the second child of any div, regardless of type. However, div span:nth-child(2) will only match if the second child is also a span.

You cannot write a selector that says “select the second span” using nth-child. That’s where nth-of-type(n) comes in. The selector div span:nth-of-type(2) does exactly that. I can’t recall that I’ve ever had this need before, but it’s also possible that I ended up adding extra tags or convoluted selectors in order to achieve what could have been more elegantly done with nth-of-type.

Additionally, while I was aware that nth-child supported constants and the keywords odd and even, I didn’t know that it also supported a formula an + b. The a is a multiplier and b is an offset. With this formula, you can select every third or fifth (or whatever) element and then move the selection by a given offset.

The selectors first-of-type, last-of-type, etc. also exist, as well as only-of-type, which matches an element when it’s the only child of that type in the parent. See Meet the Pseudo Class Selectors by Chris Coyier (CSS Tricks) for more information.

You may see where this is heading. The article The wondrous world of CSS counters by Chen Hui Jeng includes an example where he writes the famous FizzBuzz program with CSS.

Start with an ordered list,


  
  …add more li elements, like 30 of them…

Then apply the following CSS to it,

ol { list-style-position: inside } /* To line-up all items neatly */

li:nth-of-type(3n+3),
li:nth-of-type(5n+5),
li:nth-of-type(3n+3):nth-of-type(5n+5) {
  list-style: none /* When text of Fizz, Buzz or FizzBuzz appears, get rid of the numbers */
}

li:nth-of-type(3n+3)::before {  content: "Fizz" }
li:nth-of-type(5n+5)::before {  content: "Buzz" }
li:nth-of-type(3n+3):nth-of-type(5n+5)::before {  content: "FizzBuzz" }

Put it all together and you get CSS FizzBuzz.

Accessibility is important

2021-12-28T23:45:47+01:00

Published by marco on 28. Dec 2021 23:45:47 (GMT-5)

I recently read through the a11y myths. They’re quite interesting and should be required reading for managers running projects that develop web sites.

From it, I learned about the evils of overlays (see the Overlay Fact Sheet) and that there are really good resources out there, like Understanding Conformance (W3C) with WCAG 2.0 (Web Content Accessibility Guidelines).

“All WCAG 2.0 Success Criteria are written as testable criteria for objectively determining if content satisfies them. Testing the Success Criteria would involve a combination of automated testing and human evaluation. The content should be tested by those who understand how people with different types of disabilities use the Web.”

If you build custom controls, you should use ARIA (MDN). That page includes the following note,

“Many of these widgets were later incorporated into HTML5, and developers should prefer using the correct semantic HTML element over using ARIA, if such an element exists. For instance, native elements have built-in keyboard accessibility, roles and states. However, if you choose to use ARIA, you are responsible for mimicking (the equivalent) browser behavior in script.”

If you do need to use ARIA, then there’s a set of rules for its use in the article Notes on ARIA Use in HTML (W3C).

While we’re on the topic of building your own custom controls instead of using the built-in HTML inputs, we can also talk about how Good semantics also goes a long way to having good accessibility, right out of the gate. So, go ahead and use main, nav, header, footer, aside, section, and article.

There’s some really good advice in there on writing clearly (e.g. use full month names and clarify abbreviations) as well as using meaningful text in links (e.g. don’t just use “click” or “here”).

TIL CSS border-radius lets you define ellipses

2021-12-26T09:24:51+01:00

Published by marco on 26. Dec 2021 09:24:51 (GMT-5)

I hadn’t ever really thought about it because I don’t use the API very much, but it turns out that the border-radius property is not only a shorthand for setting all four corners at once, but also sets the horizontal and vertical lengths simultaneously. To set them individually, use a / between two values.

The corner radii are then calculated using ellipses as shown in the following visualization,

Border-Radius with ellipses

The article CSS Border-Radius Can Do That? by Nils Binder on October 9, 2018 (9 elements) has many more examples. It also introduces a Fancy-Border-Radius tool to help you create the desired shape visually.

Sample border-radius setting with rendering

CSS includes the much more generalized shape() API (MDN) [1], but it wouldn’t be as easy to define the “blobs” shown above with that API because the “blob” is defined by the intersection of four overlapping ellipses and the shape() API doesn’t allow combining multiple shapes into one shape.

Not only that, but the fact that the “blob”, as defined by the eight values shown above, can be quite easily animated by providing the end “blob” to a transition or by providing several “blobs” to tweenable @keyframes. You can see the technique in action in this CodePen. Scroll all the way down in the CSS definition to see that the effect uses a combination of morphing the border-radius and rotating using a transform to achieve a quite-complex and organic effect using only very straightforward and highly available CSS.

@keyframes morph {
  0% {border-radius: 40% 60% 60% 40% / 60% 30% 70% 40%;} 
  100% {border-radius: 40% 60%;} 
}

@keyframes spin {
  to {
    transform: rotate(1turn);
  }
}

[1]

You can even use “tricks” to create many shapes without using the shape() API either. See The Shapes of CSS by Chris Coyier (CSS-Tricks) for many, many examples.

Real Hacks are not easy

2021-12-23T15:30:47+01:00

Published by marco on 23. Dec 2021 15:30:47 (GMT-5)

Most of us know “hackers” from the media—either the news media, television shows like Mr. Robot, or movies like Swordfish. But the fast and easy way of hacking presented in the media actually does a disservice to how incredibly clever these hacks really are.

Less-complex techniques—like guessing or brute-forcing passwords—still work super-well. And you’ve always got social engineering hacks, like just asking someone for their credentials in an official-sounding way. But real, technical hacking involves getting to know a system’s dependencies and memory layout and runtime environment even better than the original programmers ever did.

Note: Both of these issues have been fixed, but it’s fascinating to read about how they did it. It really offers insight into what to avoid doing in your own code (e.g. do not open a WebSocket on 0.0.0.0).

NSO’s zero-click iMessage exploit

The first article A deep dive into an NSO zero-click iMessage exploit: Remote Code Execution by Ian Beer & Samuel Groß (Google Project Zero) is a longer read, but I found it fascinating how many pieces they needed to chain together in order to hack iMessage—which they managed to do with a 0-click exploit. Just sending a message to the phone with a specially coded picture in it was enough to trigger code to run automatically that, unfortunately, ran before the sandbox. It overwrote memory in a controlled manner—making sure not to crash the app—and set up its own virtual machine to execute arbitrary code, which it then did.

“JBIG2 doesn’t have scripting capabilities, but when combined with a vulnerability, it does have the ability to emulate circuits of arbitrary logic gates operating on arbitrary memory. So why not just use that to build your own computer architecture and script that!? That’s exactly what this exploit does. Using over 70,000 segment commands defining logical bit operations, they define a small computer architecture with features such as registers and a full 64-bit adder and comparator which they use to search memory and perform arithmetic operations. It’s not as fast as Javascript, but it’s fundamentally computationally equivalent.

“The bootstrapping operations for the sandbox escape exploit are written to run on this logic circuit and the whole thing runs in this weird, emulated environment created out of a single decompression pass through a JBIG2 stream. It’s pretty incredible, and at the same time, pretty terrifying.”

VSC with WSL opens an unprotected WebSocket

The second hack is less wide-reaching, in that it would apply only to certain software developers using certain tools, which automatically limits the audience. The RCE in Visual Studio Code’s Remote WSL for Fun and Negative Profit by Parsia (Hackerman's Hacking Tutorials) describes, in relatively easy-to-follow detail, how the author found a pretty big hole in the remote-debugging support for Visual Studio Code using WSL (Windows Subsystem for Linux).

In order for it to work, the user had to approve opening the port in the Windows Firewall, but it was kind of unconscionable that it opened such a big hole. The developer could be forgiven for thinking that it was OK to approve the request, given that they had just initiated an action to debug between machines. Approving a firewall in that situation is not only expected, but incredibly common. The dialog box doesn’t provide an information about which ports it wanted to amend.

The Local WebSocket Server

Every time you see a local WebSocket server, you should check WHO can connect to it.

“WebSocket connections are not bound by the Same-Origin Policy and JavaScript in the browser can connect to local servers.”

— TL;DR WebSockets

WebSockets start with a handshake. It is always a “simple” (in the context of Cross-Origin Resource Sharing or CORS) GET request so the browser sends it without a preflight request.

These bugs can be chained:

The local WebSocket server is listening on all interfaces. If allowed through the Windows firewall, outside applications may connect to this server.

The local WebSocket server does not check the Origin header in the WebSocket handshakes or have any mode of authentication. The JavaScript in the browser can connect to this server. This is true even if the server is listening on localhost.

We can spawn a Node inspector instance on a specific port. It’s also listening on all interfaces. External applications can connect to it.

If an outside app or a local website can connect to either of these servers, they can run arbitrary code on the target machine.

State of CSS 2021

2021-12-23T09:55:59+01:00

Published by marco on 23. Dec 2021 09:55:59 (GMT-5)

I just finished reading through the State of CSS 2021. It’s a well-presented [1] summary of a developer survey about CSS.

I liked the following sections:

Features: the sub-sections have a pretty fine-grained listing of CSS features, usage, caniuse and MDN links, if you’re interested in finding out what you might be missing…or want to be smug about all of the CSS features you know about and use.
Technologies: The rankings in the sub-sections are broken down by “Satisfaction”, “Interest”, “Usage”, and “Awareness”.
Awards: PostCSS, vanilla-extract, CSS Modules, and Stiches seem worth following up on.
Conclusion: They rightly note that container queries [2] and Cascade Layers [3] will drastically reduce the number of media queries and prefixing and pre- or post-processing needed for CSS. Another step in eliminating pre- or post-processing would be to support CSS nesting natively, but that’s a bit farther out. [4]

[1] The permalinks next to the titles don’t work. I submitted an issue and it may already be fixed by the time you browse to it.

[2] Available behind a flag in Blink-based browsers (Chrome, Edge, Opera). In-progress in WebKit/Safari. There’s a container-query polyfill available

[3] Available behind a flag in Blink-based browsers and Firefox and the Safari TP. [5] No polyfill, so you really have to wait for non-flag release to use this feature.

[4] Technology Preview

[5] There are support bugs for all major browsers, but no released implementation so far, even behind a feature flag.

CSS sub-grids (and grids) with Kevin Powell

2021-11-13T13:36:23+01:00

Published by marco on 13. Nov 2021 13:36:23 (GMT-5)

I’ve been using CSS Grids for a while now. I’ve found many instances where I had used flexbox, where grids turn out to be much more appropriate. That is, the grid layout algorithm lets me specify what I want without fiddling about with flex-base and flex-grow, etc. Flexbox definitely has its place, but I think we all ended up abusing it a bit in our rush to leave tables-for-layout behind.

But that’s all in the past because now we have CSS grids available everywhere and all is well with the world! That being said, if you’ve not used CSS grids yet, then you should check out this CSS-grid super-fan’s many videos. He has a playlist of CSS Grid videos by Kevin Powell (YouTube) that you can work your way through.

He even made a short video (5min) describing how to use the grid inspector in browsers. The grid inspector is super-handy, but not so intuitive to find.

Make CSS Grid soooo much easier to understand and use by Kevin Powell (YouTube)

Sub-grids

I’m more interested in what the same guy has to say about sub-grids. which are currently only available in Firefox (but it’s been available there for over 2 years now).

The 8-minute video below shows a concrete, real-world example, where you can see how little effort is required to get the browser to just align everything for you, all without fixed minimum or maximum widths (just like it used to be with tables). It should be immediately obvious why this feature is both a good thing and necessary (because the behavior can’t be replicated with existing CSS layout features).

Easy and more consistent layouts using subgrid by Kevin Powell (YouTube)

The 11-minute video below shows how the generalized mechanism lets you do the same thing for rows:

You can't do this without subgrid by Kevin Powell (YouTube)

You can find the full list of sub-grid videos (so far) in the Subgrid playlist by Kevin Powell (YouTube).

CSS sub-grids are an elegant way of aligning items without hard-coding anything (as required by existing techniques). They will continue to do what you expect regardless of the content added—i.e. there are no fixed minimum or maximum heights to make the alignment work, so you won’t be surprised when one of these artificial restrictions limits the algorithm unnecessarily (as it would with flexbox or regular grids).

You can enable Subgrid (MDN) by including grid-template-columns: subgrid. My advice to the feature designers would be to rename the value to grid-template-columns: inherit because that would be closer to the mark. Several times in the video, Kevin has to correct himself that he’s talking about the same grid rather than a copy of the grid. That’s what the nested container is doing: it’s inheriting the grid from a parent. Since it also has to declare itself as a display: grid, it can choose to inherit or explicitly set a template for its rows and/or columns. I think that would be relatively intuitive, but what do I know?

This feature kind of feels like a generalized way of getting back one of the advantages of the table-layout algorithm. The table-layout algorithm makes the cells in columns the same width throughout the table. This, despite the fact that the cells are all defined in different parents—and columns aren’t even defined as elements at all. I think we all understand why it’s not a good idea to abuse the table semantics just to be able to use the table layout algorithm. It’s nice to see that the advantages of that layout are being rescued—and generalized to be even more powerful.

Why can't identifiers start with a number?

2021-11-10T11:01:27+01:00

Published by marco on 10. Nov 2021 11:01:27 (GMT-5)

The video I’m not sure how much longer I can wait! by Kevin Powell is an excellent introduction to sub-grids in CSS. But I was more interested in the fact that he told his viewers that,

“you can use numbers in classes, but if you have a class or id that starts with a number, it’s invalid. […] It’s one of those weird things in CSS that sometimes trips people up.”

I immediately thought to myself, “it’s not weird. Every programming language is like that.”

Then, I thought, “I bet this guy only knows CSS, so he doesn’t have anything to compare it to.”

Then, I thought, “Wait…why can’t you start an identifier with a number?”

And, finally, “I bet it’s a lexing/parsing thing.”

Parser or lexer?

I’ve written several parsers for medium-sized languages and my gut feeling is that letting an identifier start with a number seems like a surefire way of making the lexer more ambiguous or pushing more work into the parsing stage.

For example, if 25L can be either an identifier or a long integer, then the parser has to figure out from context which one it is (e.g. by checking whether that identifier is declared). If it can only be a number, then it comes out of the lexer as a number token and the parser doesn’t have to disambiguate.

Even if your language doesn’t allow suffixes, you’d still have the problem with an identifier like 25, which would be legal unless you introduce the additional restriction that an identifier must have at least one alphabetic character. In that case, though, you might as well make the rule that the identifier has to start with an alphabetic character and avoid the whole ambiguity.

With that common—not weird!—rule, the disambiguation happens in the lexer, where the operation is clearer and less expensive, performance-wise.

Unresolvable ambiguity

It’s actually worse than that, though. In the case of a programming language, you could see how the following would result in a compiler ambiguity:

var 3 = 5; // I'm already confused
           //…the compiler gets it, though

var a = 3; // Now, the compiler's confused as well

Is the developer assigned the value 3 to a or the variable 3? Not only is this a terrible idea for readability, the compiler can literally not resolve this ambiguity without additional information. So there have to be restrictions on identifier names in order to avoid clashes with not only reserved words (e.g. if) but also manifest constants (e.g. 3).

Where’s the problem with CSS?

In the case of CSS, where you do have suffixes (e.g. 25px) but you can’t really mix class identifiers with values, it’s possible that you could get away with no ambiguities right now. So it’s not weird that you can’t start an identifier with a number—it’s perfectly natural for developers—but it is, in the case of CSS, not required for unambiguous processing. As you can see below, though, it’s still kind of confusing for the user.

What if we have a class named “3”? It’s not very expressive—we’d probably call the class something like “3-part-panel”—but it’s the pathological case. Maybe a class called “3px” would be even worse.

.3-part-panel {
  /* This is fine */
}
.3 {
  /* Weird, but OK */
}
.3px {
  /* Now you're just being obnoxious */
}

Do we actually get any ambiguities, though? I don’t think so. I think in this case, the authors of CSS just used the “standard” (not weird!) definition of an identifier. It’s only when you have people using CSS who have had no exposure to any other programming languages (or parsing/lexing) that you get people thinking it’s “weird” that you can’t start with a number.

The only place where you could get an ambiguity is with CSS customer properties. In that case, though, “[a] custom property is any property whose name starts with two dashes”, according to CSS Custom Properties for Cascading Variables Module Level 1 (W3C). So, variable names in CSS are even more restricted than in most programming languages. Is that weird? Again, no. As in the case above with other programming languages, the end result is more clarity for the user.

For example, the following declares a few CSS custom properties with deliberately obnoxious names.

:root {
  red: #F33;
  color: #FF0;
  0: 1;
  3px: 1px;
}

.error-text {
  color: var(red);
  background-color: var(color);
  border-width: var(3px);
  opacity: var(0);
}

Although I’ve chosen confusing values and names, this doesn’t—at first glance—seem to cause any ambiguities. As with the examples above, it does force implementations to handle enumerations (e.g. all of the colors) in the parser, rather than the lexer. If the word “red” cannot be used as a variable, then it could (possibly) be recognized as its own token in the lexer, (possibly) improving performance.

The same goes for the property names. If it’s possible for custom properties to use the same names as built-in properties, then the lexer can’t handle them. There is no ambiguity because custom-property values must be resolved using the CSS function var().

The problem is worse than that, though. There is an actual ambiguity that isn’t obvious because we’re using the :root pseudo-class [1]. The example below, using < html>, makes it clearer.

html {
  color: #F33; // Is this setting the color 
               // …or declaring a color variable?
}

This is an ambiguity that the compiler cannot resolve. So that’s why the CSS designers settled on a prefix for custom properties.

So, to a layman or user of CSS, naming restrictions on class or custom-property identifiers may seem arbitrary and “weird”, but they are a logical requirement of being able to process the grammar unambiguously.

[1] If you know where I’m headed, then fine, it’s obvious to you. Congratulations. I didn’t see it immediately, so I’m writing it this way.

C# 10 Features

2021-06-05T23:04:48+02:00

Published by marco on 5. Jun 2021 23:04:48 (GMT-5)

Updated by marco on 11. Nov 2021 08:20:38 (GMT-5)

The article Introducing C# 10 by Ken Bonny discloses some incremental but very welcome changes to the C# language in the iteration that will be released with .NET 6 in November.

In no particular order:

field in property accesses to manipulate the backing property without having to define it. This is a welcome improvement that will clean up useless boilerplate for properties that need to do something with the value before storing it (e.g. field.Trim())
The required keyword for properties in any of the supported types (e.g. records, classes, structs, or struct records). This lets types enforce initialization without forcing a constructor parameter. The compiler will force callers to initialize the property in the object initializer instead.
record struct for records that are value instead of reference types
operator overloads in records
The with operator will work with anonymous classes as well as declared types.
global usings for commonly used namespaces (e.g. System) to cut down on clutter in files
namespace without braces will put all types in that file into that namespace. This cuts down on an indenting level in all files.
Improvements to lambdas: attributes on parameters and return types, explicit return types
Static methods on interfaces (to round out the default-implementation feature introduced in C# 9)
Constant interpolated strings (e.g. $”Hello {Name}” is considered constant if Name is also considered constant (recursively, of course). Update on November 11th, 2021 from Dissecting Interpolated Strings Improvements in C# 10 by Sergey Teplyakov (Dissecting the Code): This feature is based on an a nice performance improvement, as well. The compiler now understands interpolated strings and emits more efficient code rather than always using string.Format(), which incurred allocations for unboxing, time for parsing, etc. There are even attributes to hook the compiler output that could be e.g, “used by logging frameworks to avoid string creation if the logging level is off.”
Update on June 7th, 2021 from A Closer Look at 5 New Features in C# 10 by Matthew MacDonald (Medium): Introduce !! suffix for method arguments that instructs the compiler to generated a null-check for that argument. So, string is not nullable, but not checked (i.e. the developer is responsible for including a check to avoid a NullReferenceException if one slips past the compiler), string? is nullable, and string!! is not nullable and checked. This will avoid a ton of boilerplate argument-checks. Can’t wait.
Update on November 4th, 2021 from A quick review of C# 10 new language features by Thomas Levesque: The compiler will now “[a]utomatically infer a “natural” type for a lambda”, so you can now use var to declare variable to which you assign a manifest lambda. E.g. var isEven = (int n) => n % 2 == 0; automatically gets the type Func< int, bool>.
Also from the same November 4th article: You can now “[m]ix declarations and variables in deconstruction” so that you can now write (x3, int y3) = p; where x3 is a preexisting variable.

I really appreciate how the changes build on changes that came in previous versions. There’s a very noticeable direction that they’re pulling in with these languages changes:

Being able to write performant code (records, refs, etc.)
Cutting down on boilerplate for common use cases (records, field, pattern-matching. etc.)
Being able to write maintainable, backwards-compatible code (interface default methods, etc.)
Improving type system (covariant returns, etc.)
Turning runtime issues into compile-time issues (nullability, etc.)

For more information, see the csharplang/proposals/ (GitHub) folder. Some of the C# 10 features are in the main folder rather in the csharp-10.0/ folder.

Handling Dependencies in Functional Languages

2021-06-05T22:33:53+02:00

Published by marco on 5. Jun 2021 22:33:53 (GMT-5)

Out of curiosity, I looked up how dependency injection works in functional languages. I stumbled upon this amazing article series—Six approaches to dependency injection by Scott Wlaschin (F# for Fun and Profit)
—that presents five different techniques—from very simple and easily applicable to more complex, but potentially robust.

The article series applies various abstraction techniques to a program that reads input, processes it, and writes it out again. The reading and writing are impure operations and should be abstracted away to make it easier to reason about and test the actual program logic.

The first article details Dependency Retention (hard-code everything; appropriate for scripts and POC projects) and Dependency Rejection (make an impure/pure/impure sandwich that collects program logic in a testable “middle”).

The next article covers Dependency parameterization (passing as parameters and using partial application in a separate abstraction layer). These are all pretty usable techniques.

The next two articles—The Reader Monad and Dependency interpretation—are more…involved. With both, you end up writing a description of your program that you can then execute by passing in the appropriate parameters. The dependencies are separate from the logic—in kind of in a separate layer—but there are drawbacks to these approaches. For one, they are quite complex and require everyone on the team to understand the patterns very well.

This is an example of the program description using the Reader monad.

The final article applies all of these techniques to a slightly more complex problem domain, namely a user-profile update that receives an update request, reads from a database, compares data to determine updates, and sends an email to confirm address changes. This is complex enough that we can see how the techniques scale. As expected, the more complicated but functionally pure Reader Monad and Dependency Interpretation examples take up 2/3 of the implementation and explanation (with the later taking 50% all on its own).

All in all, this is impressive work that answered my question superbly. Highly recommended. I’ve only very lightly summarized the pros and cons and descriptions above. The original author does a superb job of explaining these in much more detail—without repeating himself.

CSS and HTML Toolbox 2021

2021-04-22T18:20:27+02:00

Published by marco on 22. Apr 2021 18:20:27 (GMT-5)

Updated by marco on 11. Jan 2022 20:14:34 (GMT-5)

Over the last four months, I’ve been collecting interesting HTML/CSS techniques and ideas.

I’m planning a bit of a make-over of the earthli style and stylesheets to replace some older cruft with more modern, simpler implementations.
I’m planning a new curriculum for the JavaScript class I’ll be teaching again this coming winter.

For both of these goals, I’m focusing on leveraging as much of the power of the browser—especially CSS/HTML—as possible without getting mired in too much JavaScript or client-side libraries.

To that end, I’ve collected the stuff I learned and would like to use in a hopefully semi-readable and searchable format. I tried to split it into coherent sections with supporting information and links. YMMV.

Guides and Resources

The following guides/manuals contain a wealth of information.

MDN CSS: MDN is definitely the OG of web sites, with amazing examples and in-depth documentation of everything HTML and CSS. I have n idea how they stay so up-to-date or who pays for it, but it’s amazing and much-appreciated
W3Schools: This site is also very complete and, while it doesn’t have as much description or documentation as MDN, it has a lot of interactive samples—check out the animation and transform ones—and is easy to browse.
CSS-Tricks: This site has a lot of in-depth articles on various CSS features by various authors. You can find an in-depth article for any of the features discussed below.
web.dev: This site is run by (sponsored by?) Google and it contains a ton of articles and videos and tutorials for taking advantage of the cutting edge of web technology, with an eye toward using advanced features common to all browsers, but also introducing some of the stuff that Chrome has that hasn’t been fully adopted yet.
Web Almanac By HTTP Archive: This is an interesting guide that shows usage statistics for certain features. It’s enormous and shows an insane amount of detail about e.g. how many sites use flexbox or grid.
State of CSS: In particular for this article, you can check out the usage level of specific features. This list is definitely worth a look, as a supplement to the features I outline below.
CanIUse: While MDN also shows you the browser support for features, this site is very searchable and includes links for a tremendous amount of supporting information about quirks and flags for cutting-edge features.

Complexity

The article What Makes CSS Hard To Master has several interesting examples, but it mostly boils down to: “HTML documents are complex programs”.

It’s always been difficult to tell which styles are applied when—it’s a near-miracle that browsers can untangle the myriad ways that style rules interact with an ever-changing DOM and viewport size correctly to say nothing of doing so with such sheer alacrity.

There are selectors, media queries, related properties (e.g. position), CSS Properties, and much more and all of it cascades with inheritance everywhere. At least most browsers now handle this similarly with predictable performance.

A tremendous amount of content is generated dynamically using layers of framework code, either on the server or the client. Any one of these moving parts could introduce a seemingly innocuous change that breaks the entire layout or inheritance (e.g. when a component introduces a wrapping

somewhere, either where it’s flatly invalid (e.g. a table) or where it’s just unwanted (e.g. in a sequence of flexing containers, where the new container does not flex).

To control this chaos, most designers and developers impose self-discipline and use guidelines to avoid confusion while still allowing them to leverage the power of CSS to be able to do what they want.

From the article linked above,

“I think mastering CSS comes down to having a good amount of knowledge about it, recognising the subtle dependencies between different declarations, rules, and the DOM, understanding how they make your CSS complex, and how to best avoid them.”

Congratulations: you’ve just described programming at anything but a trivial level of complexity. If a tool has power, then you have to understand it in order to avoid hurting yourself with it. That’s why “everyone codes” is a lost cause doomed to end in failure, broken dreams, and embarrassed disappointment, like so many other quixotic attempts to ignore immanent complexity.

CSS is a moving target. Things that used to be difficult are now easy. [1] But that’s the nature of the game: someone is going to abstract away the thing you spent time learning and make it easier for everyone else. That is the nature of abstraction and frameworks. If the new thing (e.g. grid) replaces the old thing (e.g. float) well and you have time and budget to use the new thing and it’s a priority then, by all means, upgrade to use the new technique and pay down some technical debt, while hopefully gaining some flexibility.

Generators

While CSS generators—pre-processors like LESS and SASS—are invaluable, they also introduce another layer of abstraction where code is generated for the developer—sometimes with unpredictable results.

The latest versions of CSS have included some of the features introduced in these generators. Vendor prefixes are less necessary than they used to be; CSS properties and variables and eval() (as well as other standard functions) allow a flexibility beyond even that offered by pre-processor variables. Color and transformation and animation functions are standard now.

Standard Layouts

Check out the site SmolCSS by Stephanie Eckles for a long list of common layouts, like:

Responsive CSS Grid
Modern Centering
Avatar List Component
Stack Layout
And much more…

It’s called “smol” because almost all of them do a lot of heavy lifting with very few lines of CSS.

Smol Grid Layout

Selectors

The article Guide to Advanced CSS Selectors − Part One by Stephanie Eckles (Modern CSS) is a good overview with good illustrations and some selectors I’d never heard of, like General Sibling Combinator, which “[f]or example, p ~ img would style all images that are located somewhere after a paragraph provided they share the same parent.”

That whole site is beautiful and exhibits an absolute mastery of CSS. Check out the use of the skew transform for the cards at the bottom of the page or for the whole series. The rainbow gradients on the :before and :after borders and backgrounds are a great idea and well-executed.

The excellent tutorial Diving into the ::before and ::after Pseudo-Elements by Will Boyd (Coder's Block) is an absolute treasure trove of information, including how to use the ::before/::after pseudo-elements to insert content, but also noting how a classic use of ::after can now be replaced with display: flow-root (the modern clearfix). He also covers ::markers.

The article Three important things you should know about CSS :is() (Bramus) gives a few caveats but also shows the power of this operator to reduce CSS clutter (along with the up-and-coming nesting feature described below). You can use where() (MDN) instead of is() (MDN) to keep the specificity contribution of the clause neutral. The has() (MDN) selector function is defined, but isn’t available anywhere. Combine any of these with not() (MDN) for even more powerful selectors.

Properties

The article CSS custom properties are not variables (Web Platform News) explains a common misconception about CSS “variables”.

“A custom property is not a variable, but it defines a variable. Any property can use variables with the var() function whose values are defined by their associated custom properties.

“[…] This distinction is useful because it allows us to talk about “variables with fallback values” (a custom property like any other property cannot have a fallback value) and “properties using variables” (a property cannot use a custom property)”

Another great article is The styled-components Happy Path by Josh W. Comeau. which discusses styling with CSS properties in React components. In it, he references another article of his, CSS Variables in React tutorial, which is more of an introduction to some of the techniques he works with in the first article.

You commonly properties with default values on the :root pseudo-selector (MDN). [2]

The article What Can You Put in a CSS Variable? shows a lot of nice uses of CSS properties, variables, and functions (W3Schools). CSS Properties can basically hold anything you want: text, concatenated strings, references to variables, images via urls, a single value, multiple values, etc. [3]

“Some properties, like background and box-shadow, can take a list of things. You can use a CSS variable as a single item in the list, a sublist of the list, or the entire list.”

As mentioned above, declaring colors is one of the primary uses of a CSS pre-processor language. CSS Properties handle this job very nicely, without preprocessing and also with full recalculation at runtime. [4] The demo with the set of animated RGB sliders that control the color of a swatch is worth the price of admission. All without any JavaScript at all. Smooth as butter.

As a practical application, the article Make the page count of a 3D book visible using CSS Custom Properties shows how you can use CSS to make a “book” out of a div and a cover image, transforming it in 3D-space and then using a CSS property to determine how many “pages” it looks like it has.

Book with code

You can play with a demo here.

You can find a simpler and very straightforward demo in the article Sharing data between CSS and JavaScript using custom properties by Christian Heilmann, which shows how to use CSS properties with a few lines of JavaScript to follow the cursor in your document.

Practical Use Cases For CSS Variables by Ahmad Shadeed provides many, many short examples and ideas for using custom properties as an abstraction instead of setting one or more standard properties directly.

Future work: See below for a discussion of proposed but not yet supported extensions and uses of CSS Properties.

(Custom) Media Queries

The article The complete guide to CSS media queries is a great overview of how media queries work, but also how they’ve changed recently for those who’ve gotten accustomed to them over the years. For example, the section New notations in Media query levels 4 and 5 shows how ranges are easier, how you can now use or, the not() function, and custom media queries, which allows you to basically make aliases for media query combinations that you need to use in several places.

/* Define your custom media query */
@custom-media –small-screen (max-width: 768px);

/* Then use it somewhere in your stylesheet */
@media (–small-screen) {
}

/* You can also combine it with other media features */
@media (–small-screen) and (pointer: fine) {
  /* styling for small screens with a stylus */
}

Gradient Overlays

The article Handling Text Over Images in CSS by Ahmad Shadeed gives a wonderful overview with many examples on how to use gradient overlays on images to make overlay text readable for all types of images. At the end, you can see how many sites are using this (including YouTube for its overlay video controls).

See also the gradients used in borders (hover over a “card”), headers, and other elements in the ModernCSS tutorial.

Animations & Transitions

Check out Animating a CSS Gradient Border, which has no JavaScript. It leverages a newer feature of Chrome-based renderers to avoid writing a lot of keyframe boilerplates, but it’s all in CSS. You could write it all in bog-standard CSS.

Another example is a slide show written with only HTML and CSS. You can keep all slides in a single document and make animated transitions between them. See How to Play and Pause CSS Animations with CSS Custom Properties for ideas. The article An Interactive Guide to CSS Transitions provides a lot of background and interactive examples of how transitions work and how you can influence their behavior.

CSS animations apply to many, many properties—in all modern browsers—as detailed in the article The Surprising Things That CSS Can Animate by Will Boyd (Coder's Block), which shows how easy it is to animate box-shadows (for a “pulsating” effect) or even z-order, with a few other properties, to animate two items “switching places” in a very intuitive way—all without JavaScript.

Tooltips

The article Cooltipz.css — Pure CSS Customisable Tooltips by Bramus van Damme includes a good demonstration of Cooltipz. This library uses very modern, but well-supported techniques to place and format tooltips or flyouts (for non-desktop browsers).

Shapes

Understanding Clip Path in CSS shows how to work with the standard shape functions and combinators and the clip-path property to make pure-CSS non-rectangular accents and effects that run on all modern browsers.

The article Responsible Web Applications by Joy Heron is an absolutely lovely design that illustrates the power and simplicity of pure CSS. Right at the very top, it uses shape-outside and circle to make text wrap elegantly around a circular shape that contains the navigation.

shape-outside in action

The key piece of CSS is very compact and understandable.

shape-outside: circle(21rem at 1.5rem 40%);

The page makes liberal use of CSS custom properties (see below) and rem units to make everything scale nicely. It’s kind of a master class in CSS and is well worth reading.

Background clipping

Speaking of clipping, you can assign the background-clip property to determine which part of its element a background covers. In particular, setting it to the value text clips the background to show through only for area covered by text. It’s been supported for quite some time and allows developers to make dynamic effects that would otherwise have to be hard-coded in graphics.

The article CSS background-clip Demo: Text with Animated Emoji shows a neat demo of an animated SVG ghost moving back and forth behind clipped text.

background-clip with ghost emoji behind it

Filters

In the same ballpark is the backdrop-filter, which allows you to apply filters to everything behind a particular element. Naturally, you need to make the element at least partially transparent in order to see the effect.

backdrop-filter example

The CSS is very simple and supported on all modern browsers. Being able to create this kind of composition dynamically on the client brings very nice effects without pre-rendered compositing.

Clipping and masking

CSS Paper Snowflakes combines transforms, clip-paths, mask-images, and tons of properties and variables to render what look like pre-built graphics using only CSS (well, SCSS in this case).

The article CSS mix-blend-mode not working? Set a background-color! (Bramus) illustrates how to use the mix-blend-mode to make sure that the text has proper contrast versus whichever background it happens to be over.

mix-blend-mode: difference in action

This is a really nice effect and very handy for usability. You can have the browser ensure that text is always readable, regardless of what kind of background slides into place behind it.

Sticking & Snapping

The article Smooth Scrolling Sticky ScrollSpy Navigation provides a tutorial for building a JS-free TOC with sticky headers. The article Smooth Scrolling and Accessibility by Heather Migliorisi (CSS Tricks) provides some background, history, and advice on honoring user preferences.

The following CSS is enough to get started. The full demo shows how to use a little bit of JS with an IntersectionObserver to implement the ScrollSpy feature in just one line of code.

html {
  scroll-behavior: smooth;
}

main > nav {
  position: sticky;
  top: 2rem;
  align-self: start;
}

The article Using position: sticky to create persistent headers in long texts by Christian Heilmann provides a very minimal and highly re-usable example of using this feature for “sticking” headers to the top of the page when scrolling.

h1, h2, h3, h4 {
  position: sticky;
  top: 0;
}

And there’s also scroll-snap-type, scroll-snap-align, and browser units (e.g. vw and vh) to basically make a slide show out of an HTML file without any JavaScript (demo or another demo with some additional JS to highlight the displayed slide/image in a thumbnail browser).

The article Sticky CSS Grid Items is also good (another demo shows an address-book-like implementation)
The article HTML and CSS techniques to reduce your JavaScript is a multi-step tutorial to make picture browsers with “snap points” and more tricks to control scrolling with minimal or no code.
The article Practical CSS Scroll Snapping by Max Kohler (CSS-Tricks) provides many more examples and techniques.

As for “sticky” or “stuck” elements,

“[…] there is one limitation: it is impossible to change the appearance of an element whether it is stuck or not, say with a pseudo-class :stuck. This is a general limitation of CSS. In this case, I recommend combining the benefits of position: sticky to keep the element sticking with IntersectionObserver to change its appearance (while taking care not to change its dimensions, to prevent content jumps).”

The A table with both a sticky header and a sticky first column by Chris Coyier (CSS Tricks) provides a good example of using sticky to make frozen columns in tables.

For a really fancy scroll-spy, see the Progress Nav demo. This is very cool-looking, but it’s a little bit older, so also check out the Progress Nav with IntersectionObserver by Bramus for a linked version that does the same thing, but uses the IntersectionObserver to reduce the amount of code significantly.

Line clamping

For limiting text in a box, you can let the browser do all of the heavy lifting by using line-clamp or the even smoother and also standardized webkit-line-clamp. See a demo that shows how to use it in a grid layout.

Functions

The line-clamp feature is not to be confused with the clamp() CSS function, which is shorthand for bounding a value between a min() and max().

There are a ton of CSS functions, for math, colors, filters, images, fonts, shapes, and more. You can use all of these with variables and custom properties to avoid whole swaths of JavaScript.

Grid layouts

You’ll want to use minmax to override the default minimum size of auto, which is content-sizing, which can get quite large in what the cool kids are calling a “grid blowout”. See The Minimum Content Size In CSS Grid (Bramus) for examples, graphics, and more links and guides.

The tutorial Building a Side Navigation pulls a lot of concepts together to create a common UI element that tends to become a time sink if you don’t plan correctly. A lot of the CSS features used in this article help to reduce the work significantly.

Table Columns

If you’ve ever wondered what you need and for, then Highlighting columns in HTML tables by Manuel Matuzovic will show you how to use them to apply styling to a column without much additional markup. He even has an example that styles a “selected” column using the :target pseudo-selector.

Images

You can also use a simple attribute to tell the browser to be proactive about loading images.

The article Alt vs Figcaption by Elaina Natario (ThoughtBot) nicely illustrates how well browsers now handle the FigCaption tag, which is yet another feature I’d implemented on earthli long ago, but with custom HTML and extra containers and positioning code. It’s nice to know that I can replace that all with a single attribute that’s been supported for years.

Viewport units

Viewport units let the developer size elements based on the size of the viewport. This includes not only vw and vh, but also vmin and vmax, which is the minimum or maximum of the two viewport dimensions, respectively.

The article Simple Little Use Case for vmin by Chris Coyier shows a very simple way to make a highly responsive header without using media queries.

header 
{
  padding: 10vmin 1rem;
}

Controls

The article Accept several email addresses in a form with the multiple attribute (Bramus) shows you how to use the multiple property to have the browser automatically validate multiple email addresses, all without any custom JavaScript at all.

Once you’re using HTML validations (and you should), you can use the :invalid pseudo-selector to style elements that need correction. Form Validation: You want :not(:focus):invalid, not :invalid (Bramus) shows several ways of combining it with good UX to avoid annoying users with hyperactive validation messages.

A good setup is:

.error-message {
    display: none;
}

input:not(:focus):invalid {
  border-color: var(–color-invalid);
}

input:not(:focus):invalid ~ .error-message {
  display: block; 
}

input:not(:focus):not(:placeholder-shown):valid {
  border-color: var(–color-valid);
}

There’s also the new :focus-visible property to help perfect focus-display in forms.

/* Hide focus styles if they're not needed, for example, 
when an element receives focus via the mouse. */
:focus:not(:focus-visible) {
  outline: 0;
}

/* Show focus styles on keyboard focus. */
:focus-visible {
  outline: 3px solid blue;
}

See :focus-visible Is Here (Bramus) for more information.

Password controls need a bit more love, as documented in the article Perfecting the password field with the HTML passwordrules attribute by Scott Brady, which makes the case for a new attribute passwordrules to be standardized. His focus is on making password fields maximally accessible and usable for password tools.

A weaker—but available—alternative to his proposal is to use the pattern property to restrict input (helping the user, but not the password generator). To that end, he also mentions that you should set the autocomplete (MDN), autocapitalize (MDN), and autocorrect (MDN) (non-standard) properties correctly instead of just leaving them at the defaults.

User-resizing

The resize (MDN) CSS property controls the directions in which the user will be able to resize any DOM element.

“The resize CSS property sets whether an element is resizable, and if so, in which directions.”

This box should be resizable.

Accessible Components

The article A Complete Guide To Accessible Front-End Components includes everything from guidance to links to tutorials to full-fledged examples and screenshots of HTML/CSS/JS implementations of commonly used controls that are also accessible.

The “Tab Panel” is quite nice in that it responsively switches to an accordion at smaller widths.

The article Building a Settings component by Adam Argyle (web.dev) demonstrates accessible components using a lot of pretty advanced—but generally available—techniques, like properties, grids (w/align-items, vw, minmax, auto-fit for pretty much automatic responsiveness with nearly no code), dark/light theming, light JS manipulation of controls, FormData, accent-color, and much more. Watch the embedded video (YouTube) for a very quick, 8-minute overview, play with the live demo or grab the source (GitHub).

Web Components

Styling: Styles Piercing Shadow DOM shows you how to reset all styles in your component, using the :host pseudo-selector.

:host {
  /* Reset specific CSS properties */
  color: initial;

  /* Reset all CSS properties */
  all: initial;
}

The article Options for styling web components by Nolan Lawson (Read the Tea Leaves) shows how to design a styling API for a web component using CSS custom properties.

Custom Form Elements

The article Creating Custom Form Controls with ElementInternals by Caleb Williams (CSS Tricks) introduces an interesting concept. The example it uses is to make a single “control” that holds several text inputs, which isn’t groundbreaking, but it does show the power of packaging CSS/HTML/JS as components that show up as simple tags with properties.

None of that is new—we’ve had web components for a while now—but the ElementInternals allows deep integration into the form’s workings, including hooking validation, submitting, drawing, and so on.

The `inherit` value

The inherit value is not new, but I often forget to use it as intended. It’s meant to help avoid re-stating a base color.

The following example changes the color for the nav tags to red, but wants links to retain the original color.

body { color: black; }
nav { color: red; }
nav a { color: black; }

Instead of repeating the value black, you can instead use inherit.

body { color: black; }
nav { color: red; }
nav a { color: inherit; }

The initial value is also useful.

`content-visibility`

The article content-visibility: the new CSS property that boosts your rendering performance discusses a very new feature. It landed in official releases of Chrome, Opera, and Edge in September 2020.

“The content-visibility CSS property controls whether or not an element renders its contents at all, along with forcing a strong set of containments, allowing user agents to potentially omit large swathes of layout and rendering work until it becomes needed. Basically it enables the user agent to skip an element’s rendering work, including layout and painting, until it is needed, makes the initial page load much faster.”

Related to this newer property are the existing will-change, object-fit, and contain. See contain-intrinsic-size (MDN) and content-visibility for more information.

`box-decoration-break`

Update January 2022: Added the section below.

The article box-decoration-break helps to define how elements should be rendered across lines by Stefan Judis presents an interesting property that lets you determine how padding, border, and other properties are applied to inline elements that span multiple lines.

Isolating Siblings

Instead of setting arbitrary z-indexes in your styles, sometimes the isolation property is a better way of creating a stacking context (MDN).

Page Visibility

The Page Visibility API (MDN) is available in all browsers and provides a high-level API for running code when showing or hiding a page.

“With tabbed browsing, there is a reasonable chance that any given webpage is in the background and thus not visible to the user. The Page Visibility API provides events you can watch for to know when a document becomes visible or hidden, as well as features to look at the current visibility state of the page.”

Pages can use this to “pause” activity when they’re in the background (e.g. server-polling or animations). In the case of animations, though, “Most browsers stop sending requestAnimationFrame() callbacks to background tabs or hidden < iframe>s in order to improve performance and battery life.” They also “throttle SetTimeout()”.

Houdini

The CSS Houdini (MDN) APIs are a low-level way to hook custom JavaScript into various parts of the rendering pipeline. Of particular interest is the part that’s finished and implemented in all browsers: the CSSOM (CSS Object Model) and Houdini, which let a page render custom CSS effects using JavaScript. The collection of low-level APIs is known by the umbrella term Houdini, described in Cross-browser paint worklets and Houdini.how.

From the MDN page linked above:

“Houdini is a set of low-level APIs that exposes parts of the CSS engine, giving developers the power to extend CSS by hooking into the styling and layout process of a browser’s rendering engine. Houdini is a group of APIs that give developers direct access to the CSS Object Model (CSSOM), enabling developers to write code the browser can parse as CSS, thereby creating new CSS features without waiting for them to be implemented natively in browsers.”

And:

“Houdini enables faster parse times than using JavaScript style for style changes. Browsers parse the CSSOM — including layout, paint, and composite processes — before applying any style updates found in scripts. In addition, layout, paint, and composite processes are repeated for JavaScript style updates. Houdini code doesn’t wait for that first rendering cycle to be complete. Rather, it is included in that first cycle — creating renderable, understandable styles. Houdini provides an object-based API for working with CSS values in JavaScript.”

Houdine.How is a collection of open-source CSS extensions that you can use, extend, and learn from. I heard about this from css-houdini-circles — A Houdini Paint Worklet that draws Colorful Background Circles by Bram Van Damme (Bram.us) (see his code (GitHub))

The following video provides an excellent overview in 12 minutes.

Extending CSS with Houdini by Google Chrome Developers (YouTube)

Once you start making custom effects, you’ll run into classic rendering problems, one of which is addressed in the article CSS paint API: Being predictably random, which explains how to use a stable seed to use predictably random data for animation data.

While the painting API is relatively well-supported, the Layout API is still in early days.

“The layout stage of CSS is responsible for generating and positioning fragments from the box tree. […] This specification describes an API which allows developers to layout a box in response to computed style and box tree changes.”

VisBug

The VisBug Chrome/Opera/Edge Extension is an excellent tool in general, but seems to be indispensable for optimizing Houdini code.

Day 1 Keynote (Chrome Dev Summit 2018) by Google Chrome Developers (YouTube)

Skip to 23:25 for the VisBug demonstration.

As of April 2021, the features described below are experimental and either not implemented yet or only available in canary builds or behind feature flags (or both).

Advanced Properties

The future of CSS: Higher Level Custom Properties to control multiple declarations by Bramus Van Damme discusses a very, very recent proposal (December 2020), discussed in detail in the issue [css-variables?] Higher level custom properties that control multiple declarations #5624 (GitHub)

The article @property: giving superpowers to CSS variables by Una Kravets (web dev) provides more examples.

Container Queries

Another interesting up-and-coming development is container queries (Bram.us), which are like media queries, but addressing the nearest “root” container in the list of parent containers for the element to which it’s applied. The article CSS Container Queries: A First Look + Demo takes you step by step through using it. Basically, you write @container (min-width: 38rem) instead of @media (min-width: 38rem) and assign the contain property, like so: contain: layout inline-size.

The article Say Hello To CSS Container Queries by Ahmad Shadeed provides a lot of real-world examples that will make you wonder how we’ve lived with only viewport-based media queries for so long.

Nesting in CSS

One of the main features values added by a CSS pre-processor like LESS is nesting, which improves clarity and cuts down on duplicated definitions. The article The future of CSS: Nesting Selectors by Bramus indicates that this feature is coming to mainline CSS, as documented in CSS Nesting Module (W3C). The document is an editors’ draft, so there’s still quite a way to go.

Nested Media Queries are already supported, though more as a side-effect of the implementations, not necessarily because it was specified that way.

Logical properties

The “logical properties” feature will add aliases for some of the venerable CSS properties like margin-right and margin-left that make it easier to build more agnostic and flexible content using, e.g., margin-inline-start and margin-inline-end. Assigning one of these instead of a hard-coded side means that a style will work in both LTR and RTL (for example).

The article Digging Into CSS Logical Properties by Ahmad Shadeed provides many more examples. The full list of proposed properties (MDN) is quite extensive. Many of the newer modules like flexbox and grid were designed like this from the very start.

See also CSS Logical Properties Are the Future of the Web & I18N by Daniel Yuschick for more information and tons of examples, with a demystification of the difference between direction (inline axis, or flow) and writing-mode (block axis).

Update: 16.10.2021

Two more interesting logical properties are inline-size (MDN) and block-size (MDN), which correspond to width and height in the horizontal-tb writing-mode (MDN). Using the logical properties means that the layout works even if the writing mode is changed to vertical-lr or vertical-rl.

Portals

The article Hands-on with Portals: seamless navigation on the web explains how this new feature in Chrome/Chromium improves support for securely embedding content from other sites (i.e. “portals”), as when using OAuth providers. It also generally improves transitions in MPAs (Multiple Page Applications) by allowing one page to prepare another rendered page in memory and then transition to it and perhaps even back.

“Single Page Applications (SPAs) offer nice transitions but come at the cost of higher complexity to build. Multi-page Applications (MPAs) are much easier to build, but you end up with blank screens between pages.

“Portals offer the best of both worlds: the low complexity of an MPA with the seamless transitions of an SPA. Think of them like an < iframe> in that they allow for embedding, but unlike an < iframe>, they also come with features to navigate to their content.”

Page Lifecycle

The article Page Lifecycle API by Philip Walton (Google Developers) discusses an improvement over even the “Page Visibility” API (discussed above). Instead just handling visibility, it also provides hooks for suspending and resuming pages.

“The Page Lifecycle API, shipping in Chrome 68, provides lifecycle hooks so your pages can safely handle these browser interventions without affecting the user experience. Take a look at the API to see whether you should be implementing these features in your application.

“[…] While the web platform has long had events that related to lifecycle states — like load, unload, and visibilitychange — these events only allow developers to respond to user-initiated lifecycle state changes.”

The article Deep Dive into Page Lifecycle API by Viduni Wickramarachchi (Bits and Pieces) provides some real-world tests and data.
Check out the page-lifecycle (GitHub) package for working with the new events.

[1] The video Thinking on ways to solve CENTERING (and accompanying article) is an excellent look at several modern techniques for doing centering content, most of them one-liners with good-to-great behavior in many situations.

[2] While you’re at it, take a look at HSL: a color format for humans by Paul Hebert (CloudFour), a format that is a lot more intuitive for blending and setting up color schemes than the classic RGB. There are also many guides online for picking a color set, like Which color scale to use when visualizing data by Lisa Charlotte Rost (Datawrapper) and How to pick more beautiful colors for your data visualizations by Lisa Charlotte Rost (Datawrapper) as well as tools for choosing colors that work well together, like Colordot, ColorDot, Farbvelo, or I Want Hue.

[3] If you didn’t know about :root, then check out the list of Pseudo-elements to see which extra parts of a document you have access to with CSS (e.g. the ::file-selector-button selector is a relatively new addition that lets you style the button in an upload control).

[4] The article Injecting a JavaScript Attack Vector using CSS Custom Properties by Bramus show a site that executes JavaScript stored in a CSS property.

Configuring and using Jetbrains Rider 2021.1.1 and Visual Studio 2019 16.9.4

2021-04-18T22:50:04+02:00

Published by marco on 18. Apr 2021 22:50:04 (GMT-5)

Updated by marco on 23. Apr 2021 08:59:44 (GMT-5)

Visual Studio with ReSharper has been my main development tool for many, many years. I first started using it in 2008 or 2009.

Over the last several years, I’ve used many other IDEs, like Visual Studio Code for documentation, advanced search, and JavaScript/TypeScript or PHPStorm for PHP, Android Studio for Java/Android, XCode for Swift/iOS, or WebStorm for TypeScript/JavaScript.

JetBrains Rider came on the scene several years ago and was not, at first, a viable alternative, but it has gotten much, much better. It now makes sense to consider using Rider as well as or even instead of Visual Studio/R#.

tl;dr: Both IDEs are excellent, if configured properly. Download the solution template to see an example of a solution configuration that includes all settings discussed below.

Initial Visual Studio Setup

Before going into the new setup, let’s briefly discuss what we were replacing.

Visual Studio Community 2019 16.9.4 [1]
ReSharper 2021.1.1
StyleCop by JetBrains 2020.3.0
ReCommended Extension for ReSharper 5.5.0
Enhanced Tooltip 3.15.0
Heap Allocations Viewer 2020.3.0
CyclomaticComplexity 2020.3.0
.EditorConfig used only lightly

All inspections and quick-fixes run through ReSharper. Visual Studio “squiggles” are disabled because they’re distracting and contribute nothing additional. StyleCop does a lot of the heavy lifting, but it does a bit too much. It checks spelling in documentation, even though ReSharper already does that natively.

The biggest drawback is that StyleCop uses its own parser, which is not just detrimental to performance—the Roslyn parser, the ReSharper parser, and the Style Cop parser are all running at the same time—but also the StyleCop parser is no longer compatible with some features of C# 8 and 9. It records “syntax errors” for perfectly valid code.

Initial Rider Setup

Rider doesn’t support the StyleCop, ReCommended, or the Enhanced Tooltip extensions. Not having Enhanced Tooltip isn’t that big a deal (Rider’s tooltips are OK), but not having StyleCop and ReCommended meant a significant number of style and formatting inspections were not applied in Rider.

Rider supports style and formatting, but it doesn’t warn or indicate when there are issues. This makes it more difficult to help developers use a common style.

Requirements

Get as close to feature parity as possible between Rider and Visual Studio/ReSharper.
Retain StyleCop’s style and formatting checks, but without the old parser
Reduce inspection redundancy wherever possible
Use human-readable and human-maintainable configuration
Use common confirmation for Rider and Visual Studio/ReSharper
Use configuration that Visual Studio, but also Visual Studio Code understands wherever possible or advantageous

Approach

`StyleCop.Analyzers`

The StyleCop.Analyzers project has been around for a while, but making the move is not as straightforward as just installing the package in all projects. You also have to rewrite the configuration. Luckily, they have a good template from which to start and the documentation is very good.

Since the test solution uses Directory.Build.Props, it also made it very easy to include the assembly and configuration for all projects. I created a special version for test assemblies that removes the documentation requirement.

StyleCop.Analyzers has its own JSON configuration, but it uses the .NET-standard rulesets to configure inspection severities.

Removing the StyleCop plugin for ReSharper was not without drawbacks; it removed a few minor goodies to which I’d grown accustomed:

The “Chop Parameters” quick-fix was nice.
The “Add documentation” was better than the default in either VS or Rider.

Update 22.04.2021: I’ve since discovered that “chop” is available in Visual Studio by positioning on a method, pressing Ctrl + ., and choosing one of the many wrapping options.

Wrap-parameter options

Also, documentation-generation is getting better with each point release.

`.EditorConfig`

Another standard is using the .EditorConfig file for as much configuration as possible. This format is not IDE-specific: Visual Studio, ReSharper, Rider, Visual Studio Code, and many other editors/IDEs make use of it. Keeping as many settings as possible in this file helps ensure style and formatting is applied correctly no matter which IDE is used. It’s not a guarantee, but there’s a better chance than if these settings are stored in a ReSharper-specific format, as before.

These days, a lot of the configuration can be stored in an .EditorConfig file—all but a handful of the Rider and ReSharper settings are mapped there already and there are a few more with each release.

`.Directory.Build.Props`

I’m also using SDK-style project files together with the Directory.Build.Props feature of the MSBuild system to consolidate configuration to just one or two files.

Evaluated Setup

Visual Studio:

Visual Studio Community 2019 16.9.4
ReSharper 2021.1.1
ReCommended Extension for ReSharper 5.5.0
Enhanced Tooltip 3.15.0

Rider:

Rider 2021.1.1

Shared:

StyleCop.Analyzers 1.2.0-beta.333
Heap Allocations Viewer 2020.3.0
CyclomaticComplexity 2020.3.0
.EditorConfig used for nearly everything

I have not tested Visual Studio without ReSharper because, although Visual Studio has leapt forward in functionality, there are still too many features I miss without ReSharper. [2]

General Options

I use a separate Git client called SmartGit, so I generally turn off as must of the Git integration as possible to save power and memory. The CodeLens (VS)/Code Vision (Rider) is an amazing insight into a ton of statistical information, but I don’t ever use it, so I turned it off. Also, I don’t like how it feels when editing code because it introduces virtual “lines” in too many place. I also would sometimes inadvertently click the links and then have to close detail panels or refocus the editor.

For the same reason, I disable almost all inlay hints in Rider/ReSharper (inline hints in Visual Studio). I do not miss seeing types everywhere. I only care what the actual types are when something doesn’t compile. In Rider, you can long-hold the Ctrl key to show inlay hints on-demand. The only inlay hint I always show is for inherited attributes (e.g. for [NotNull] annotations).

I’ve also disabled Code Folding (Rider)/Outlining (Visual Studio) because I never use it. I don’t need to see the noise along the left-hand gutter and I don’t need to accidentally click the nodes (or accidentally trigger a folding with an inadvertent key combination).

VS/ReSharper Options

Adjust shortcuts/keys
- ⌘ + W to close a window
- ⌘ + ⌥ + W to close other windows
- ⌘ + ⇧ + W to close other windows
Change color for boxing allocation to orange (default is red)
Disable inlay hints for parameter names
Disable more inlay hints
Disable Code Style
Disable Outlining
Disable Environment / Intellisense / Automatically show parameter info in …
Use Package References for NuGet by default

Rider Options

These are options that I ended up changing from the defaults.

Turn on CamelHumps
Turn on exception-handling / tune
Adjust shortcuts/keys
- ⌘ + W to close a window
- ⌘ + ⌥ + W to close other windows
- ⌘ + ⇧ + W to close other windows
- Ctrl+R/Ctrl+G key-combo to the same command as Ctrl+(opt-)+O (Optimize Imports). This command pops up an ominous “Cleaning up Code…” progress dialog for a second but, so far, it seems to be doing only the import-cleanup.
- Alt + < to navigate to methods in the same file
Change color for boxing allocation to orange (default is red)
Change color for matching brace (it’s the same as for selected text, which is very confusing, as it always looked like I’d selected the brace as well)
Disable inlay hints for parameter names
Disable more inlay hints
Disable Code Vision (equivalent to Code Lens in Visual Studio)
Disable Code Folding
Disable Editor / General / Code Completion / Show the parameter info popup in …
Use Package References for NuGet by defaultReduce default font size from 13 to 12
Reduce default line-spacing from 1.2 to 1.05

Code Style and Formatting

For C# Code style, I ended up adding these extra settings. There are probably others, but these are the ones that made ⌘ + K / ⌘ + D usable for me, especially for the single-line null-check statements that we use a lot.

Check the box for “Keep existing arrangement of embedded blocks” to prevent reformatting of blocks
Set the lines to add “After statements with child blocks” to 0
Set “Wrap invocation arguments” set to “Chop if long or multiline” (I’m not 100% this won’t mess other things up, but it worked a treat in a long test-fixture file)
Set “Max object and collection initializer elements on a single line” to 1, so he stops reformatting multi-element blocks that we’ve put on multiple lines.

With the first two settings, the formatter won’t fix some things that he would have fixed before, but he’s also not going to change a whole bunch of stuff that you’d rather he left alone.

Keep existing arrangements of embedded blocksLines to add after statements with child blocksWrapMultipleArguments

It took me a few tries to configure Ctrl+K/Ctrl+D (format document) in Rider, which doesn’t work as loosely as in ReSharper/Visual Studio. In Visual Studio, it leaves single-line argument checks alone. Rider is more … consistent … and reformats all lines, which messes up a lot of formatting.

On the positive side, the configuration for Rider ended up improving “Code Cleanup” in Visual Studio/ReSharper, which had never worked so well before. I eventually figured out how to set things up so that “Format Document” and “Code Cleanup” (Ctrl+E/Ctrl+F) both work flawlessly in Rider and Visual Studio, but it took some time and patience to find all of the settings. The “Detect Formatting Settings” in both ReSharper and Rider were indispensable.

File Layout

I also finally configured the “File Layout” feature so that “Clean Up Code” works as expected. StyleCop Analyzers supports enforcing an ordering on members, but it doesn’t support configuration of that ordering. The order is fixed as StyleCop wants it. Their default style has fields at the top, which is a no-go for our style.

That means that I’ve disabled the “arrangement” feature of StyleCop and no longer see warnings about out-of-order members. This is OK, though, as re-ordering members just to fix a warning is not that great for reviews and merging. “Clean Up Code”, however, does apply the file-layout rules.

I think that this is a better balance overall, as leaving a method in place when you’ve changed its visibility from public to protects (or vice versa) should not earn a warning.

Configuration Files

As noted above, I configured all of the StyleCop, .EditorConfig, and Rider/R# settings to make “format document” and “clean up code” work perfectly with our style. These are just a jumping-off point (even within Encodo). Adjust StyleCop inspection severities in the *.ruleset files.

Adjust formatting preferences in the .EditorConfig whenever you can. Rider/ReSharper will also allow you to override these settings, storing them in the *.sln.DotSettings file, but it’s clearer and more consistent to configure the ruleset and .EditorConfig files because those are more human-readable and better-documented than the *.sln.DotSettings file.

Tip: Download the solution template to see an example of the configuration with all settings discussed above.

Comparison

I made this comparison over the last 4 months, during which the setup changed slowly into the configuration outlined above. I have tried to weed out the notes and impressions that no longer apply, but I may have missed some. I do my best to give the impression of what it’s like to work with these IDEs. I left some longer descriptions in place, just to give a feel of what I experienced while using the IDEs.

Launch and Processes

For small-to-medium projects on a my 4-year-old desktop, you barely notice startup. For the larger Quino project, with over 120 projects (for now), startup speed is more noticeable.

All of the IDEs start relatively quickly now. They’re just fast in different places. It really depends on where your focus is. Visual Studio by itself starts very, very quickly. The latest versions of ReSharper start up in parallel, so VS is on the screen and the editor is typable in seconds, even with a solution like Quino. You can’t search at that point, though. [3]

Rider looks like it’s totally up and running, but it mostly can’t search either, not until the projects have been processed and the indexes loaded. The initial Rider project-chooser takes longer to start up than you’d expect. Once it’s up, though, opening a solution from there is very fast. Rider runs all open solutions in a single process. Visual Studio launches a separate process per solution.

While I’m happy that the startup speed has improved all-around, I don’t really care about startup speed, not really. I never reboot unless I have to. I never log out unless I reboot. I just leave my tools running all the time. I have 32GB of RAM. Once it’s running, it’s running, and I don’t care how much RAM it takes (within reason)—I care how fast it does the things I ask of it.

Once I configured StyleCop.Analyzers, my initial solution-load in Rider showed a shocking amount of memory for Quino (an extra 4.5GB just for the Roslyn checker process). It felt fast enough, even though the memory usage kept growing. Rider’s a 64-bit process and I have 32GB of RAM on my desktop, so it was a luxury I could afford.

8GB Rider10GB Rider (1 minute later)

Luckily, after a restart, the memory was still higher than it was, but now stable at around 3GB.

Conclusion: No real launch-speed advantage until a solution is navigable. Both use about a reasonable amount of RAM.

Performance Issues and Crashes

Solution-wide analysis is enabled by default in Rider, with no performance degradation noticeable at all. In fairness, there is little to no performance degradation evident with ReSharper in Visual Studio either.

Code Vision is enabled by default in Rider; also no performance-degradation noticeable. I am running everything on a desktop and I have seen CPU usage spike quite high on Rider. Code Lens in Visual Studio and Code Vision in Rider both probably suck the life out of a battery, though. TANSTAFL.

While it’s nice that Rider uses all available CPU power for certain tasks—e.g. building—I imagine that the CPU fan would be running a lot under heavy usage. Visual Studio probably suffers the same, though its CPU usage seemed to be flatter when I checked.

Solution-reloading is more stable and a bit faster than in Visual Studio. In a recent task where I was constantly cherry-picking and rebasing, making changes to project files and the solution file, Rider just worked. Visual Studio would usually throw up a yellow warning bar at the top sooner or later (usually sooner).

Sometimes, Rider is quite slow at getting its “intention actions”, something I’ve never seen with ReSharper.

This usually clears up after 5-10 seconds, but a couple of times, Rider went looking for inspections for 10 seconds and came up with nothing—repeatedly. It’s odd because, in that case, Rider kept having trouble with the same extension-method call and had to look it up again and again. This effect is noticeable in other places, as well. When you elect to show the dialog to “Configure Inspection Severity”, then sometimes it takes several seconds to show the dialog box (with no user feedback).

And, sometimes, Rider just dies. For example, when I look up sources for a .NET type, like IndentedTextWriter, by using ⌥ + F12. Rider showed a dialog for several seconds, but didn’t seem to be doing anything. It wasn’t downloading, as expected; instead, it just showed “Searching for implementations…”.

This wouldn’t be worth mentioning but, after having dismissed the dialog, now I can’t navigate to anything with F12. I have to restart Rider. This is not the first time that this has happened. This never happened with Visual Studio. It definitely makes the IDE feel much shakier.

In Visual Studio, with R#, I can view the sources for IndentedTextWriter after only a slight pause.

On the subject of reloading: Visual Studio definitely still freezes more (usually showing its yellow warning bar at the top after a few seconds), but Rider is just more subtle about being loaded, but still unusable. You have to keep an eye on the progress bar at the bottom in both IDEs. In general, Rider reloads more quickly than Visual Studio—and has no UI “hangs”, like VS still does, for a few seconds—but not always.

On the other, other hand, I’ve also experienced more build errors after changing framework targets than with Visual Studio. Rider can’t copy files or its looking in the wrong place for files. Restarting Rider fixed that problem, but I shouldn’t have to restart to fix a build. Rebuild should have fixed it, but it didn’t.

Conclusion: The latest version of Visual Studio with ReSharper feels more stable than Rider and has fewer mysterious failures. Rider reloads more quickly.

Code Analyzers and Quick Fixes

I was unable to get Rider to respect the generated_code setting from the .EditorConfig file, something that worked immediately with Visual Studio/Roslyn (ReSharper is not involved). I’ve reported that issue as RIDER-61283. In the meantime, I’m using the “Elements to Skip” feature to ignore the same file masks Rider should be ignoring anyway. That at least works for now.

Still, Rider’s integration is nice because it pulls everything together into a single list, but its quick-fixes for Analyzer inspections aren’t as strong as Visual Studio’s nor can you actually fix everything (see the issue with UTF8 below).

In Visual Studio, the analyzers work quite well, but there is no integration with ReSharper. Instead, the integration with Visual Studio is really good—with Ctrl + . instead of ⌥ + ⏎, you can get quick fixes and even apply them to the entire method, document, project, or solution.

In Visual Studio, there’s a very nice preview mode. In fact, there is useful and accurate user feedback throughout, which was a pleasant surprise. It’s quite fast in collecting fixes for all 120 projects and applying the changes. There’s even good keyboard support for arrowing to the file/project/solution actions. This is a definite boon for getting through thousands of fixes quickly.

In Rider, there are quick fixes, but most of them only work for a single instance of the inspection. Some of the fixes (e.g. each attribute on its own line) can be applied to file/project/solution with ReSharper as well, but not all. Some of the fixes aren’t available at all with ReSharper (e.g. SA1513, insert newline after brace) but are available in Rider.

So, Visual Studio’s integration with Code Analyzers worked better out of the box, but it forces you to use both ReSharper quick fixes (⌥ + ⏎) and VS quick fixes (⌘ + .), depending on which system detected the issue. The inspections also show up in two different panes. This is actually easier to get used to than it sounds, though.

Conclusion: Rider has a merged view, which is nice. Visual Studio has quick fixes for everything, applicable to method, document, project, and solution.

Inspections

There is no ReCommended extension for Rider (with no plans to add support, according to issue #51: Add support for Rider 2020.2, which was closed as “too much work”. All of these inspections are missing in Rider.

Check usage of annotations
Check async/await usage
Check usage of lock variables
Etc.

When you add a parameter to the constructor, Rider doesn’t mark the identifier as unused if it has an attribute. In the examples below, you can see that the identifier is grayed out in Visual Studio, but not in Rider.

Unused parameter in Visual StudioUnused parameter in Rider

Sometimes Rider doesn’t indicate when a conditional access is unnecessary (e.g. when ?. can be converted to .). It also doesn’t indicate when an expression that is always false or true could be simplified as reliably as ReSharper does.

Neither Rider nor ReSharper seems to notice when you do a silly pattern-matching check, like if (sender is Person person) when person is already a Person. VS, Rider, and ReSharper simply assume that you’re doing the check in order to assign the variable, I guess.

Conclusion: Both have a tremendous number of useful inspections, but Visual Studio/ReSharper/ReCommended is a slightly more powerful combination.

Inspection Accuracy & Speed

Now I know why the solution-wide analysis is so fast in Rider: It doesn’t reevaluate warnings when the project changes (e.g. if you change the root namespace). You have to visit each file individually for it to clear the warning. Clicking “Reanalyze all files with errors” doesn’t work on files with warnings, as it does under ReSharper.

You can use ⌥ + ⇧ + PgDn to jump through the warnings, opening each file as you go. It’s pretty fast, but feels clunky. This is especially unfortunate when Rider thinks that there are errors. I suppose that this is a side-effect of repeated solution/project reloads as I’m quickly switching branches.

Changes to the ruleset and stylecop settings are noticed in both IDEs instantly. I changed a rule from warning to info and Rider changed the color of the squiggle in what felt like less than a second. Unfortunately, changes to the .stylecop.json file are not picked up without a reload of the solution.

Here is where ReSharper is much perceivably faster than Visual Studio. It’s even a bit faster than Rider. Turn on solution-wide analysis. Remove the last reference to a function. Watch ReSharper gray out the identifier in the declaration nearly immediately. Or remove a method call. Watch ReSharper underline it immediately. Visual Studio/Roslyn? Still feels laggy.

ReSharper’s list of errors and warning updates immediately. Rider’s is pretty good, too, but, mysteriously, not as accurate or quick-to-update as ReSharper’s. Both are much faster than Visual Studio/Roslyn, which often takes long seconds to clear warnings or errors—and sometimes never does, until you force a build.

Roslyn (Visual Studio) is sometimes flaky and won’t clear old warnings/errors until the next build. ReSharper was definitely faster here, even with the extra StyleCop parser. This didn’t used to be an issue, but with the switch to Code Analyzers, I’m now using Visual Studio/Roslyn for a good portion of my inspections (StyleCop).

What does flaky mean? Whereas Rider updates relatively reliably when you make a change in any file, StyleCop Code Analyzers in Visual Studio will only occasionally show the warnings. If the file isn’t open (or in some sort of in-memory cache), then only a “Rebuild All” will make the warning appear. This also only works if you’re not using “ReSharper Build”.

Rider does this much less often, but it still does occasionally have incorrect inspections that can be very difficult to correct. For example, the following screenshots show an unrecognized dictionary.

Dictionary not found

Visual Studio recognizes the using System.Collections.Generic, but Rider grays it out.

Dictionary marked as not used

Restarting Rider sorted out this error. Several other cached errors and warnings disappeared with the one noted above.

Rider is very quick, as is ReSharper. Also, it’s generally pretty good on updating inspections, but I’ve also seen flakiness with lingering warnings and errors in the pane, but never in the sources. The only way I’ve found to update the pane is by actually opening the file, at which point Rider re-detects that the issues are gone and clears the inspections. Manually triggering a reanalysis does not help here.

Conclusion: Both have lingering inspections sometimes but, overall, Rider’s inspection speed and accuracy are both better. For either IDE, run “Rebuild All” to see all warnings.

Navigation

The solution-wide find/replace window in Rider is lightning-fast and supports newlines, copy/paste, regular expressions, shows change previews. It’s wonderful. The change previews in Visual Studio Code are just a tiny-bit better, but the overall experience is solid and super-fast. The search/replace in Visual Studio is looks very dated next to this feature in Rider.

Navigation to other files is so fast in Rider that I sometimes thought it hadn’t navigated (it had!)

There is no way to navigate the warnings in a solution using the keyboard. In general, Rider tends to let panels “steal” the keys for next/previous, so when you try to navigate errors or warnings or find-results, the test session can “steal” these keys and suddenly you’re navigating tests and fixtures instead. I find myself grabbing the mouse more often in Rider than I do in Visual Studio.

Where ReSharper has Ctrl + T as a central search for everything, the same key combination does not include “search everything” in Rider. For that, you need to switch to Ctrl + ⇧ + F. On the other hand, the dedicated “find in solution” panel is lightning fast and makes up having to switch between panes.

Conclusion: Both offer really good navigation, but I give a slight edge to Visual Studio/ReSharper for consistent keyboard support.

Editing

Column/Noncontiguous Selection

Rider doesn’t really support extending a non-contiguous selection. It has column-selection mode, like Visual Studio, but it doesn’t have ⌘ + Shift + . to select “like” text. In Sublime Text and Visual Studio Code, this feature is available via ⌘ + D. Rider doesn’t seem to have this, which limits editing capabilities. There is documentation for multi-selection but the shortcut keys are confusing and not the ones I have assigned. Nor can I find anything in the keymap with any of those names. It’s either a new feature or its only partially supported.

Update 23.04.2021: I just tried ⌘ + Shift + . in Rider (even though that wasn’t documented) and it works just like in Visual Studio! That’s a nice surprise. I’m not sure if this was always there and just poorly documented or whether they just added it in a recent release. At any rate, good news for editing in Rider.

Commenting

Pressing Ctrl+K/Ctrl+C comments code. However, instead of commenting again, it uncomments if applied a second time. This means I can’t “double comment” to indicate that this code is temporarily preserved, but should not be flagged as commented code to be removed.

Double-click Select

Double-clicking on an identifier uses CamelHumps, if you have CamelHumps enabled (just like all other JetBrains tools). With ReSharper, though, the CamelHumps apply to cursor-based word-selection, but a double-click selects the whole word. I think that’s a better balance because that’s what I expect when I double-click an identifier. I don’t think I’ve ever wanted to select just a part of the double-clicked word by default. It’s not a deal-breaker, but it’s annoying because I have to double-click, then extend the selection manually to get the full identifier.

Undo Buffers

The undo function in Rider fails much more often than I’m used to from Visual Studio. I’ve deleted lines of documentation and then hit undo and Rider couldn’t get them back.

Undo Error

Once the undo buffer is broken, you have to restart Rider in order to be able to undo again. It feels quite unstable. I’m quite surprised, considering the literally dozens of popular IDEs built on this platform.

UTF-8 Support

Rider creates files as UTF-8, but without the BOM. Then the StyleCop analyzer demands that the file have a BOM, but there is no quick fix in Rider for this, nor is it clear how to convert the file. I end up switching back to Visual Studio, where there’s a quick fix to set the encoding properly.

Typing Speed

Typing speed is better in Rider than in Visual Studio/ReSharper. Just a little, but it is. It’s smoother. Even after replacing the StyleCop extension with StyleCop.Analyzers, it still feels a bit smoother, overall. Rider on Mac feels even smoother than Windows.

Conclusion: Rider’s text-editing is smoother but Visual Studio feels slightly more solid overall.

Code Completion

I just wasted 10 minutes in Visual Studio trying to figure out from the documentation how to create a StreamWriter with a non-default encoding. The list of overloads did not show any overloads when using a path.

StreamWriter constructor overloads in Visual Studio

I searched and the wizards at StackOverflow rather snippily asked why not use the docs? So I looked at the docs and then switched to the right target (first .NET 2.1, then .NET Standard 2.0), but the desired overloads have been around forever. Back to VS and it is really not showing those overloads. Switch to Rider and … there they are.

StreamWriter constructor overloads in Rider

It turns out that Visual Studio has a maximum height for its overloads list. The only hint that there are more methods are some heretofore not-noticed dashes at the bottom. The only way to see the other overloads is to select the popup and use the arrow keys. There is no scroll bar or other evidence to indicate that this is possible. There is also no reason why the popup couldn’t be taller.

In Visual Studio, the developer can use the up arrow and down arrow to traverse the various overloads, showing the documentation for them. In Rider, it’s not obvious how to navigate. The trick is to keep hitting ⌘ + ⇧ + space to cycle forward through the list.

Typing a { in a non-interpolated string does not show code-completion. In ReSharper, you can type {, select a variable and ReSharper automatically makes the string interpolated. If you add a parameter, Rider rightly complains that the data between the curly braces needs to be an index, but doesn’t offer to convert the string to interpolated. You have to go back to the front of the string and add the $ yourself. This is now working in Rider 2020.3

Conclusion: Visual Studio’s UI is more easily navigable, but Rider’s UI is better for longer lists of overloads.

Refactoring

Rider doesn’t offer to rename related symbols as much as ReSharper does. For example, if you rename a field, ReSharper will offer to rename the constructor parameter that sets that field. Rider does not.

When you insert a new parameter in a method call and then tell Rider to add it to the method, it then shows a panel with other calls that need to be updated, asking how to handle each one. This is the same as in ReSharper and is a welcome feature. As in ReSharper, you can navigate the various calls with the arrow keys and the focus is set correctly. However, I can’t figure out how to activate the choices with the keyboard. I have to use the mouse.

Conclusion: Both amazing, but slight edge to Visual Studio for completeness.

Restore & Build

The NuGet integration is nice in Rider and the NuGet Explorer is quite fast. It still doesn’t feel as robust as Visual Studio, but it’s getting there. I rarely went back to Visual Studio to try to resolve an issue I couldn’t solve in the Rider UI.

Rider’s “build” command still doesn’t notice when you’re changed packages external to the solution and do a nuget restore for you. In fact, when I updated Winform DevEx packages externally (because neither the NuGet UI in Rider nor that in VS could apply the changes without getting tripped up in dependencies because it can’t upgrade multiple projects at once), Rider had no idea what I’d done until I manually deleted the obj folders from the projects that depend on DevEx.

I don’t recall having to do that for Visual Studio, which runs a nuget restore check before each build. Visual Studio was more amenable to finding the actual error with a “rebuild all”. Rider cached more and stayed stuck on the original “error”, which was hiding the real problem (an interface mismatch after the upgrade).

When you update NuGet packages, Rider uses stale data a lot more than Visual Studio does now. This is how Visual Studio used to be, but it’s gotten a lot better with its caches. Rider is still a few steps behind. I just upgraded NuGet packages for a project and then ran the tests. A bunch of them failed with a MissingMethodException.

I know this error, so I forced a full rebuild and ran the tests again. This time everything worked. With Visual Studio, I’d gotten used to no longer having to consider “rebuild all” or “restart the IDE” as possible solutions. With Rider, you still have to occasionally use these solutions, for now.

It’s not the end of the world, but it does waste time and effort—especially if you don’t jump to that conclusion quickly enough. Often enough, you’ll lose a good quarter of an hour chasing phantom errors and warnings instead.

Conclusion: They’re both about the same for day-to-day use, but Visual Studio is still slightly better at corner cases.

Testing

When you edit a unit test to change the parameters to a test case, the test session will update and then move the selection to the top of the list. This is very annoying since it always scrolls away from the test area I had focused. It also has an annoying habit of nearly constantly changing the selected item in the tree, making navigation difficult.

This might be related to when tests are running or a build is running, but there’s always something like that going on—it’s not very nice that the whole IDE has to be quiet before I can use keyboard navigation in a tree without Rider constantly stealing focus and jumping around.

While running tests, Rider does not allow you to collapse nodes in the unit-test session. It quite annoyingly expands it again whenever you try to collapse a node.

Searching in tests is quite slow in both Rider and ReSharper.

Update 23.04.2021: I’ve discovered that I can use F4 in Rider to jump to the source of a test. That’s very handy because double-clicking on a test in either test runner has unpredictable results that seem to depend on whether the test is defined in a base class.

I can’t treat the Unit Test Session window as an editor window in Rider, so it’s harder to switch back and forth. The tests are docked at the bottom by default. You have to switch to that window with a hotkey, then use another hotkey to hide it. I’m getting used to it, but I don’t understand why the JetBrains IDE doesn’t support this feature (it doesn’t have it in any other JetBrains IDEs I’ve used either).

Conclusion: There’s not much difference in testing support between the two.

Debugging

Integrated debugging with auto-disassemble and sources in Rider is pretty awesome (e.g. I debugged into SimpleInjector without SourceLink). You can open any referenced type in any assembly and either have the original source from SourceLink [4] or disassembly. In either case, you can set breakpoints and debug into it. If the file is disassembled, it’s not always pretty, but it’s amazingly useful for inspection.

The Smart Step-in feature in Rider is a very nice upgrade, to which I’ve already become quite accustomed (just ⇥ to cycle locations). It’s a bit finer-grained than being able to disable property step-in universally in Visual Studio.

On the other hand, I’m not super-happy with the different ways of running an application in Rider. They seem to make it very difficult to debug an application and stop on unexpected errors. I’ve seen other users using Rider just kind of look in the output window as if live debugging wasn’t a feature we should all expect to work. It can be configured, but you have to make sure to run in debug mode and turn on exception-handling.

It’s also much harder to debug a StackOverflowException in tests because Rider doesn’t show a useful stack trace (it instead shows a trace for the LogException in the test runner itself. The “launch log file” is detailed, but provides no additional information. Instead, I was forced to set breakpoints and continually “edge closer” to the crash and find it myself. This is how Visual Studio used to work, but for a couple of years, its handling for stack overflows has been much better.

Also, Rider doesn’t stop on unhandled exceptions by default, either when running tests or running a web server. The stack trace in the debug output when running the web server isn’t highlighted and can’t be clicked.

The debugger in Rider does not make use of the DebuggerTypeProxy to display or format debugging information, which is a shame because Quino has useful customizations for debugger display that I miss in Rider.

Invalid Targets

I was unable to debug unit tests for a while because Rider complained that my DotNet runtime (AnyCPU) didn’t match the chosen testing target (x86). All of the solutions I’ve opened have been “Any CPU”-only, so I was mystified how Rider came up with the idea to run my tests as x86.

Rider pops up a helpful tip to take me directly to the setting to change the runtime to use. I don’t even have an x86 runtime. And I don’t want to run tests as x86 anyway.

The real fix is to go to Settings => Build, Execution, Deployment => Unit Testing => Default platform architecture and set it to “Automatic”. Mine was hard-coded to x86, for some reason (maybe a settings upgrade from an older version).

Variable Inspection

Viewing a variable isn’t as easy because Rider uses a much less-stable tooltip than VS. If you have a long value that you want to “view”, you have to cruise your mouse along a long, skinny tooltip for dozens of centimeters before you can click the “view” button (you have to know it’s there) at the end.

Debugging tooltip

Since the tooltip is unstable, Rider has trained me to go down to the variable window and copy the value from there.

Conclusion: Both debuggers are excellent—each has advantages and disadvantages—but Visual Studio still has a slight edge. It feels more mature.

External Sources

Both Rider and VS/ReSharper support navigation using SourceLink as of 2020.3, which is a massive win for usability. Now you can open a type with Ctrl + T or hit F12, ⌥ + Home, ⌥ + End to navigate to a related symbol from source and Rider/ReSharper will navigate within the SourceLink sources, which means that you can easily set breakpoints in code from NuGet packages, as long as they have SourceLink. Rider additionally offers support for setting breakpoints in disassembled code, with mixed results.

However, browsing works less well in Rider. For example, I pressed ⌥ + F12 on EventHandler to “peek” it and it popped up a processing dialog for 15 seconds before I canceled it. When I pressed F12 to navigate there instead, it didn’t show a progress dialog, but it also just seemed to break Rider because syntax-highlighting and code-completion stopped working for subsequently typed code. The “Errors in Solution” pane was similarly crippled, showing files with warnings, but no warnings. The navigation action never showed the code for the EventHandler, but it did make everything else stop working. A restart fixed everything.

In addition, navigation to authenticated sources was only working temporarily. It is broken in the most recent version of Rider, as I’ve documented in RIDER-61280.

Conclusion: Visual Studio is more reliable with SourceLink right now. Rider can debug and set breakpoints in disassembled sources (which it has to use because SourceLink is broken again).

Documentation

The formatting for XML documentation works strangely when Rider inserts text in documentation (e.g. when you apply a fix). We use a tab size of 2 everywhere, but the settings window shows a tab size of 4, but also mention that some settings might be overridden by the .EditorConfig. Reformatting or cleaning up code fixes the indentation to where it should be. It’s unclear where Rider is getting its settings for the initial insertion.

Even with the StyleCop Analyzers, there are fewer fixes for XML documentation than with Visual Studio/ReSharper. For example, there is no way to quickly add parameter documentation. Rider does not have any significant support for generating documentation (the initial format is very compact and never formatted according to rules).

Rider’s parameter-completion in documentation works more smoothly (Esc not necessary), but it does not use a “smart” sorting for tags. In ReSharper, once I’ve selected paramref once, that is sorted at the top and selected by default. In Rider, the order is unchanged, so I have to arrow down or type out most of the tag name in order to get past param.

Rider still shows a hint to add on the class, even if the class has its own documentation.

Conclusion: Visual Studio’s XML documentation support is slightly better than Rider’s, but they’re both quite good.

General UI

Action Menu

There’s an extra item in the action list for “move to separate file” that does nothing. There’s another item that includes the name of the file in the caption that does work.

Tooltips

There’s no Enhanced Tooltip extension (and the tooltips are not as nicely formatted in Rider)

Tooltip in ReSharperTooltip in Rider

Icons

I can’t seem to change colors of icons as I can for ReSharper. I’d gotten used to brighter colors and miss them in Rider.

ReSharper Icon Options

Inlays

In ReSharper, you can disable specific inlay hints directly from the completion menu. In Rider, you can do this for some of them, but not all. If it’s not there, you have to select “Configure inlay hints” and then have to find the corresponding checkbox yourself.

Windows Integration

~~Rider doesn’t keep track of the last opened solutions to open from the task list.~~ [As of 2021.1.1, the task list is now populated with recent solutions.]

Git Integration

The “Commit” panel doesn’t refresh very quickly at all. Long after I’d seen the files in SmartGit, they were still not in the panel. When I switched away and then back, the new changes suddenly appeared. I don’t use the integrated Git support, but I’m not going to start, either, after seeing how it works.

File Structure

The file structure panel does not highlight the element where the cursor is.
The file structure panel does not support dragging and dropping
The file structure panel does not refresh to reflect changes without closing it (or switching to another panel that hides it) and reopening/reshowing it

Key Maps

I can’t search for the bindings for a key combination in Rider, like I can in Visual Studio. Instead, I have to guess at the name of the operation that I think it’s bound to.

Update 23.04.2021: I’ve found that if you click on the magnifying glass to the right of the search field, you can “Find actions by shortcut”.

Fine-tuning Formatting

Rider also doesn’t have the “show active configurations” panel, for some reason. I’m currently fighting with Rider because it suddenly came up with the idea to format everything with 4 spaces instead of 2 spaces. Just yesterday this was finally working so that I could reformat the document and everything worked. Now, Rider is reindenting everything for me. Visual Studio/ReSharper is showing that I have 2 spaces configured.

Active configurations/file formatting info

Conclusion: Visual Studio w/ReSharper has a slight advantage in that it still feels more complete than Rider does.

Conclusion

Although Visual Studio/ReSharper edged out Rider in most of these categories, you’re well-served with either one. I think if I’d compared Visual Studio by itself to Rider, then Rider would have won easily. It’s only in combination with ReSharper that Visual Studio ends up being a bit better. It’s just more mature and I never found myself going to Rider from Visual Studio, whereas I did have to open Visual Studio a few times to fix something I couldn’t do in Rider.

It’s happening less with each version, though. Over the four months of the evaluation, Rider has improved steadily [5] I think you’re well-served with either version.

Once Rider files off a few more rough edges and has true feature-parity—perhaps by natively implementing some of the inspections from the ReCommended extension—it’s slightly smoother editor might help it pull ahead in this comparison.

Most of the above is complaining to a very standard, though. Both IDEs will make anyone who knows how to use them a much more efficient developer of reliable and readable code.

[1] I actually have an Enterprise license through Encodo’s Silver Partnership. However, when that license lapsed at one point (we forgot to take a test on time), I’d installed Community in order to keep working. A few days later, when my license was restored, I didn’t see the need to install Enterprise again. Almost two years later and I still haven’t done it. The Visual Studio installer doesn’t offer to “upgrade” the Community edition and I don’t have space for side-by-side installations. I could uninstall and re-install, but everything’s working now, and I miss nothing from Enterprise.

[2]

The last time I tried working with Visual Studio without ReSharper was over two years ago, with Visual Studio 2019 Preview 3. Still, I can see much more of Visual Studio working better than ever, taking over more and more of what I use ReSharper for.

I’d installed Visual Studio 2019 Preview 3 to investigate the following,

Memory usage
C# 8 features
How does VS2019 work without ReSharper?

I installed the desktop and web-development workloads, totaling almost 6GB.

Memory usage is better, obviously.
I liked the default highlighting with purple for flow keywords (I actually ended up turning this on for ReSharper, as well)
Finding references, ancestors and implementations is still tedious because Visual Studio does everything in panels or tool windows, rather than with popup menus
Solution-wide analysis isn’t nearly the same
Source-level highlighting of errors and warnings as I type are noticeably slower
The test runner is much better than it used to be, but still not as good as the one in ReSharper/Rider.
Ctrl + Q to quick-search features works just as well as Ctrl + Shift + A does to find “actions” in ReSharper.

In November of 2020, the article Announcing .NET 5.0 by Richard Lander (MS Blogs) wrote that,

“[…m]oving forward, the idea is that as when we add new features to .NET, we’re also adding corresponding analyzers and code fixers to help you use them correctly, right out of the gate.”

and

“With .NET 5, we have heavily improved our support for static code analysis. This includes an analyzer for platform-specific code and a better mechanism to deal with obsoletions. The .NET 5 SDK includes over 230 analyzers!”

The latest versions of VS also allow you to fine-tune the severity of any warnings directly from the UI/Solution Explorer. This is all a great leap forward for Visual Studio 2019, but ReSharper still improves the following features:

Navigation is much faster
Searching and introspection is more efficient and uses better UX than VS’s standard lists
There are more refactorings. Visual Studio doesn’t yet help with inlining variables and methods in as many places as ReSharper does. Also, there are subtle inspections and quick fixes, like the following two that are only available with ReSharper/Rider.

Superfluous Condition
Simplify Expression
The test runner is head and shoulders above that of VS
I miss “Find Everywhere”. R#’s integration of “find in files” is so much faster than VS.

[3] Visual Studio’s search is available, but it’s weaker than ReSharper’s. Enough so that I don’t want to use it unless I have to. It’s good, don’t get me wrong. It’s just not as good as I know what I could have were I to install ReSharper.

[4] As noted in a few places above.

Set a Git Tag on Azure

2021-03-30T21:04:45+02:00

Published by marco on 30. Mar 2021 21:04:45 (GMT-5)

As with installing a dotnet tool on Azure, there isn’t a standard task for setting a Git tag from a pipeline YAML configuration. The Pipeline UI has an option to easily do this, but that hasn’t translated to a task yet, nor does it look like it’s likely to, according to online discussions.

Setting a Git tag is relatively straightforward, but is complicated by permissions (as with installing a dotnet tool. To tag a build, you have to just execute the git commands in a script.

  − task: CmdLine@2
    displayName: Push Git Tag
    inputs:
      script: |
        git tag $(Build.BuildNumber)
        git push origin $(Build.BuildNumber)

If, for whatever reason, you want the tag to be created by the triggering user, then include the following lines as well:

  − task: CmdLine@2
    displayName: Push Git Tag
    inputs:
      script: |
        git config user.email $env:BUILD_REQUESTEDFOREMAIL
        git config user.name $env:BUILD_REQUESTEDFOR
        git tag $(Build.BuildNumber)
        git push origin $(Build.BuildNumber)

You should include this step after the version number has been updated.

With the task in place, you have to ensure that you’ve granted permissions to the proper user.

Go to the “project settings”
Select “Repos/Repositories”
Select the “Permissions” tab
Allow the specific permission “Contribute” for the “Project Collection Build Service” user. [1]

Granting Repository Permissions

[1] Note: granting the permission to “Project Collection Build Service Accounts” or “[Project Name] Build Services” had no noticeable effect.

Installing a dotnet tool on Azure

2021-03-29T22:36:59+02:00

Published by marco on 29. Mar 2021 22:36:59 (GMT-5)

I have a .NET solution (Quino) that contains a project that I publish as a `dotnet` tool. The tool calculates a version number based on the branch and version number found in the solution. I use it from Quino itself and also from other project pipelines.

In order to use it from any pipeline (including Quino itself), I need to install it from the Quino artifact feed. The original solution is a couple of years old: I’d had a secure file for NuGet.Config that included the PAT. This works fine, until the PAT expires.

So, I went searching for a better solution and thought I’d try something a bit more resilient and better-supported. By now, I’m using YAML files for my pipeline, so I tried the DotNet task, but it doesn’t support installing tools.

There are open issues and even a very old open pull-request for supporting a Microsoft tool on Microsoft’s premiere hosting service that Microsoft has steadfastly ignored. There seem to be no plans for supporting dotnet tool install natively, with seamless authentication, as they’ve done for dotnet restore. The example below shows how this works for restore.

  − task: DotNetCoreCLI@2
    displayName: 'Restore Server Packages'
    inputs:
      command: 'restore'
      feedsToUse: 'select'
      feedRestore: 'Quino'
      projects: 'server/src/**/*.csproj'
      verbosityRestore: Normal
      includeNuGetOrg: true

I was hoping to follow this pattern to use the dotnet task to install a tool with something like the following:

  − task: DotNetCoreCLI@2
    displayName: 'Restore Server Packages'
    inputs:
      command: 'tool install'
      feedsToUse: 'select'
      feedRestore: 'Quino'
      includeNuGetOrg: true
      isGlobal: true
      toolName: quino

There is no support for this. The PR mentioned above would support it, but it’s never been accepted and Microsoft has not seen fit to add automatically authenticated feeds for anything other than restore.

Instead, I use two tasks: the first is a workaround for the lack of proper support in Azure for `dotnet tool install` from authenticated feeds; the second installs the tool. See dotnet tool install/update” not working with Azure Artifacts #10057 and Add dotnet tool install command to support tools location in Azure Artifact feeds #13401 (the PR) for more information.

I can copy/paste the two tasks below into all of the pipelines that need it. It’s a bit bulky and non-intuitive, but it is both project-agnostic and doesn’t include any passwords or PATs directly. Instead, it uses the $(System.AccessToken). If the project has been granted access to the feed identified by using the standard feed permissions control panel, then it works.

  − task: NuGetCommand@2
    displayName: 'NuGet Add Credentials For Internal Feed'
    inputs:
      command: custom
      arguments: > 'sources add
        -Name ""
        -Source "/nuget/v3/index.json"
        -Username "this_value_could_anything"
        -Password "$(System.AccessToken)"'
  − task: CmdLine@3
    displayName: Install tools
    inputs:
      targetType: inline
      script: dotnet tool install  –global

Where:

is obtained from your Azure project
doesn’t matter, as long as it doesn’t conflict with any defaults
is the name of the tool to execute

This is utterly unintuitive, but it works and it’s not too much hacking. I think it’s indisputable that it would be much nicer if “install tool” was an option for the “dotnet” command. It’s not like it’s an external tool. This is literally how Microsoft has asked us to work.

It would be nice if I hadn’t had to spend half an afternoon trying to figure out how to get a dotnet tool installed from a feed in the same project on Azure. I’m glad I got it working, but everyone who comes after will also waste time trying to figure this out—or will give up and use a gross hack instead.

A nice CSS demo that uses 350% CPU

2021-03-13T22:09:08+01:00

Published by marco on 13. Mar 2021 22:09:08 (GMT-5)

The article Getting the Mouse Position using CSS by Bramus talks about a neat trick that uses sibling elements to react to mouse events without using JavaScript. It also features some kick-ass translucency and animation effects with CSS transitions.

As you move the cursor around, the layer of “cells” change X and Y positions that the CSS text elements “watch”. This lets the central elements “follow” the mouse, transforming a stack of “CSS” texts in a nicely composed and layered stack. It looks like this.

While this is a nice-looking effect—and it’s impressive that it works purely in the browser and purely in CSS—it kicked in the fan on my iMac, something that rarely happens.

That said, the compositing features of a modern browser are impressive and can save website authors a lot of time and effort. That this is even possible is already really, really nice. Maybe with a bit of tweaking, it can be made less detrimental to battery life.

If you want to try it out yourself or tweak the code, check out the CodePen.

TIHI: SmartGit's new "Discard to Stash" Feature

2021-03-04T22:39:04+01:00

Published by marco on 4. Mar 2021 22:39:04 (GMT-5)

Updated by marco on 4. Mar 2021 22:39:38 (GMT-5)

This a quick note for anyone else who’s downloaded the latest version of SmartGit (20.2.3 #16150) and is seeing mysterious stashes that they know they haven’t created.

There’s a new feature called “discard to stash” that is enabled by default.

Discard to Stash selected by default

What this does is to stash every time you press ⌘ + Z to discard changes. I understand that this is a failsafe “just in case”, but I kept ended up with a dozen stashes I had no use for. On balance, I’d rather have the tiny risk of wanting changes back that I’d discarded—I can’t recall this ever having happened—than the “noise” of stashes muddling the list of actual stashes I’d saved.

I started off trying to train myself to hit right arrow and then enter, or typing D instead of ⏎, but I gave up and found an “advanced” preference to switch the default behavior.

Change default value for Discard to Stash

Open Preferences
Select “Low-level properties”
Select “I accept the risk!”
Search for “discard”
Change the value to “false”
Press OK

A breakdown of programming languages used in a week

2021-02-17T21:56:03+01:00

Published by marco on 17. Feb 2021 21:56:03 (GMT-5)

In one recent week, I realized I’d been working in many different areas and on many different projects, so I took an inventory.

For one project, I reconfigured a program with Delphi Pascal, using Delphi 7 (it’s a very old, legacy solution) to run on my local machine instead of in a VM that had swollen to 120GB. For that project, I also used SQL on SQL Server, running in a Docker container that I’d configured with YAML. The solution has several products, among which you can switch, so I wrote a Windows Batch program to transfer and back up versions, so you can nicely diff them with SmartGit using Git. In order to diff SQL, I used a tool written in TypeScript, which I extended with a few fixes and tests written with Jest in Visual Studio Code. I updated the documentation in Markdown.

At the same time, I was working on Quino, written in C# for the .NET platform, using Visual Studio on Windows and Jetbrains Rider on MacOS. I also set up a new solution using Quino, which involved editing a bunch of XML project files as well as configuring SQL Server and PostgreSql with Docker. I again used YAML to define pipelines in Azure DevOps.

For two evenings, I graded final projects for a JavaScript class I’ve just finished teaching. On the other evenings, I researched modern HTML, CSS, and SVG for an upcoming redesign of earthli. I made a few PHP fixes for earthli as well.

I wrote blog posts, wiki entries, and issue analyses in Jira Syntax, Markdown, XWiki syntax, and earthli Syntax in both English and German.

Software without Process

2021-01-17T17:43:34+01:00

Published by marco on 17. Jan 2021 17:43:34 (GMT-5)

A software product with undocumented or poorly documented commits and a patchy issue-tracker is akin to a shipping pallet with 100 boxes haphazardly stacked on it, all wrapped up in shipping cellophane. You can see some of the labels and some of them you can’t and some of the boxes definitely don’t even have labels at all.

If it looks like the pallet to the right, then you already know you can’t ship it. That’s an obvious train-wreck of a project that’s going to blow up in everyone’s face. But the picture to the left looks…OK…ish. How do you know if it’s legit? Check the shipping manifest and get out your scanner gun, right?

The shipping manifest on your clipboard has 3 and ½ items on it, none of them really helpful. If you really want to be sure about what you’re shipping, you’re going to have to unwrap the whole thing and look at each box individually, noting it on the manifest if it’s missing — and maybe even opening it up to see what’s actually inside. Maybe it’s even broken and leaking on other boxes, somewhere in the middle of that whole pile.

Maybe someone wrapped it in cellophane to give it the sheen of reliability, but you can’t know for sure. Is it possible that you spend all of the time to dot the i’s and cross the t’s just in order to find out that it was fine, but just drastically under-documented? It’s possible, of course. That’s a risk you take when you try to be professional. The alternative is to become a gambler—shipping something and hoping that it doesn’t come back to haunt you.

A better approach would have been to use a documenting process as you built the product—like engineers rather than cowboys—slowing our awesome selves down a bit, but also—maybe, just maybe—getting faster because we’re more careful and can avoid wasting time on work that doesn’t need to be done.

Documenting the work to be done—e.g. to explain it to other team members—can have the much-appreciated side-effect of focusing you on the work that actually needs doing. This is generally more efficient and satisfying than just shooting out of the gate and doing what you “know needs doin’” and not noticing possible ramifications until it’s too late do anything but react to rather than plan for them.

In the end, you have not only solid, well-designed, and tested software, but also good documentation of what was actually done for a given release, as well as analyses for what was not done and what needs doing in the future. That everything is well-documented enough to implement now means you’ve got half a chance of still knowing what it means in ½ or ¾ of a year when you finally get a chance to plan and implement it.

Who knows? You may never need to work on it again—which is just fine. At least you’ll know what you didn’t implement and why. This is very helpful for that time, in a year or two, when you think of this exact same solution and are maybe too stressed or under too much pressure to remember why you decided against it the first time.

A good software product is not just the product itself, but all of the metadata surrounding it: the documentation, the analyses, the release notes, the roadmap.

Set up PHP With Docker, PHPStorm, and XDebug

2021-01-17T00:01:34+01:00

Published by marco on 17. Jan 2021 00:01:34 (GMT-5)

Until now, PHP debugging involved a fragile balance between the IDE, the server, and the debugger, each with overly verbose configuration. On top of that, using Docker introduced the wrinkle that you were technically debugging on a remote server rather than on the “real” localhost.

It’s been a long journey, but it’s finally a lot easier to set up PHP debugging with a server running in a Docker container. Once you use the most modern tools, everything works with a couple of lines of configuration.

tl;dr:

Ignore anything you find on StackOverflow from before November of 2020 and use the install-php-extensions project instead (see example below).

Set environment variables in the docker-compose file to indicate the client and the default mode (debug)

Use the latest PHPStorm, which supports XDebug 3.x

My setup is as follows:

PHPStorm 2020.3.1 running on MacOS
Docker 3.0.4 running on MacOS
“db” container running MySql 5.7.24
“web” container running PHP 7.2.24 on Apache

So far, so good: it’s basically a standard developer setup for PHP where I have an IDE on my machine and am running servers in Docker containers. XDebug initiates a connection from the server in the “web” container back to the IDE on the docker host.

Without further ado, these are the magic configuration files to install extensions and set up XDebug for PHP.

web/Dockerfile

After much searching and rigamarole and fighting with docker-php-ext-install and docker-php-ext-enable and PECL and where the PHP.INI is and whether I need to move one of the default files somewhere so that PECL can update it and downloading dependencies with apt-get and getting the right dependencies, depending on the PHP version and passing the right flags to docker-php-ext-configure if the version is a bit older and, and, and…

After trying a ton of no-longer-relevant and now-overly-complex suggestions on StackOverflow, I finally returned to php on dockerhub and discovered a hint to use the install-php-extensions project, which basically takes care of everything for you.

It does. End of story.

FROM php:7.2.24-apache

ENV DEBIAN_FRONTEND=noninteractive

ADD https://github.com/mlocati/docker-php-extension-installer/releases/
latest/download/install-php-extensions /usr/local/bin/

RUN chmod +x /usr/local/bin/install-php-extensions && sync && \
    install-php-extensions gd xdebug mysqli exif zip

I pin the PHP version to the one on my server, download the latest version of install-php-extensions [1] and then call it to install the non-standard extensions I use on earthli:

exif: Extract date information from pictures
gd: Generate thumbnails
mysqli: Provide access to MySql using a legacy API
xdebug: Debugging support on the server
zip: Open and read files from ZIP archives

See the web site for the list of supported packages. Your site will likely use different ones (but you should definitely install xdebug because it’s totally easy to use now).

docker-compose.yaml

Finally, you just need to set two environment variables to enable debugging for PHP:

XDEBUG_CONFIG: accepts a list of settings, but we only need to set the client_host to tell XDebug which machine hosts the IDE to which to connect (Docker handily provides the host.docker.internal alias for MacOS and Windows)
XDEBUG_MODE: this sets up the tool for step-debugging (see XDEBUG mode for more information).

I’ve included nearly the full Dockerfile from earthli, but the only relevant part for debugging is in the environment.

  web:
    build: web
    container_name: "${COMPOSE_PROJECT_NAME}-web"
    restart: unless-stopped
    ports:
      − 80:80
    volumes:
      − ../site:/var/www/html
      − ../lib:/var/tmp/earthli.com-lib
      − ../../earthli-webcore/site:/var/tmp/webcore-site
      − ../../earthli-webcore/lib:/var/tmp/webcore-lib
      − ../../earthli-data:/var/tmp/earthli-data
      − ../../earthli-logs:/var/tmp/logs
      − ../config/apache-dev.conf:/etc/apache2/sites-available/000-default.conf
    depends_on:
      − db
    environment:
      XDEBUG_CONFIG: client_host=host.docker.internal
      XDEBUG_MODE: debug

PHPStorm and Browser

At this point, you’re well on your way to debugging with PHPStorm. From here, follow the instructions in the settings dialog, shown below.

PHPStorm Debug Settings

You can click the “Validate” link in the dialog to verify that your XDebug is recognized and working in principle
Activate step debugging from the browser. You can pass XDEBUG_SESSION=PHPSTORM in the query string, but that gets a bit tedious. Instead, install a browser-debugging extension, which simply injects the cookie XDEBUG_SESSION=PHPSTORM into the request so that PHPStorm knows that debugging is desired. See XDebug’s documentation for more information on other ways of triggering debugging, including from the command line (e.g. when running unit tests).
Make sure PHPStorm is listening for incoming PHP Debug Connections (you can toggle in the dialog shown above or from the toolbar in the IDE).

That’s it. A long and kind of painful journey has finally led to a solid and easy-to-configure debugging experience for PHP.

[1] I will probably pin the version to the one I’m using right now because I know it works.

A dynamically generated Groovy foot-gun

2020-07-05T21:52:52+02:00

Published by marco on 5. Jul 2020 21:52:52 (GMT-5)

Groovy is a dynamically typed programming language that executes on the Java Runtime. It mixes its own highly dynamic syntax with islands of Java code. The Android ecosystem and its IDE use Gradle for its build scripts. Gradle uses the Groovy programming language.

The Problem Code

A large project I’m working on contains quite a bit of custom Gradle code for integrating framework libraries, making obfuscated builds, configuring publication, and, finally, creating signed builds.

The signed builds are configured using standard Android Gradle DSL commands. Basically, there was a block of code something like the one shown below.

signing {
  storeFile = getKeyStoreFile()
  storePassword = getKeyStorePassword()
  keyAlias = getKeyAlias()
  keyPassword = getKeyPassword()
}

The names of the methods (e.g. getKeyAlias) used to be different before I’d refactored them to have more standard names. [1] The methods check whether there are environment variables set by the build server, using sane defaults for developer builds. [2]

This is where I went wrong. Never touch a running system [3], even when you’re trying to pull it back from the precipice of “maintenance nightmare that everyone is terrified to touch, to say nothing of change”. Well, I changed it, and ended up frittering away a couple of hours investigating the Groovy “feature” outlined below.

Lowering Groovy to Java

Groovy performs syntax-checking, but is extremely lenient as far types and variables are concerned. Variables have to be defined, but pretty much anything can be coerced into anything else. It is transformed to Java code and then Java byte code by the Java compiler. Any typing errors you see are from the Java compiler, not the

As any programming language would, Groovy resolves identifiers to match the declaration that is closest in scope to the call, even when that declaration is generated at compile-time and couldn’t possibly be the one that the original author had intended to call. This is going to be important later (which is why I put it in scary italics).

The four methods above are defined in an ext {} block [4]. Calling them without a specific target as above automatically resolves to the methods from the ext {} block.

Oddly, of the four properties being set in the example above, only the first two actually called the methods I’d defined in the ext {} block. The calls to getKeyAlias() and getKeyPassword() were not made to the expected functions. I could tell they weren’t being called because the logger.info() calls from those two methods never appeared in the output.

What the hell is going on? If you look carefully, you’ll notice that the first two methods have different names than you would use for writing the getter and setter for the properties being assigned. The second two methods match those names exactly.

Auto-generated Java Code

When Groovy lowers its syntax to Java code, it declares these getters and setters. The Java compiler, in turn, references these new methods because the calls in the original Groovy code hadn’t been specific about the target of the methods. Instead of lowering to Java code and being explicit about which ext block the method should be called from, Groovy just left the naked call as I’d written it. Probably, if I’d explicitly called ext.getKeyAlias(), it would have avoided calling the dynamically generated this.getKeyAlias() method.

Of course, Groovy had trained me to stop prepended the target ext. on global function calls because ext resolves to different things, depending on the DSL-specific context. Sometimes it’s the root project’s extra variables and sometimes it’s the sub-projects extra variables and sometimes ext doesn’t work at all (e.g. in Java classes, naturally, but also in blocks created by special keywords).

Sure, you can trying playing around with rootProject.ext. or other similar constructs, but the code quickly becomes even more unreadable than it already would be and the non-prefixed version works 99% of the time.

So what ended up happening was that, instead of calling the method I’d actually called, the Groovy compiler generated a new method with the same name and a higher specificity in the scope, capturing the call. Instead of calling my method, it ended up calling this.setKeyAlias(this.getKeyAlias()), which is basically a NOP that leaves the property empty.

Fixing the Problem

The solution is to use a unique name for the function that does not conflict with any of the auto-generated getters. That is, of course, an unmaintainable nightmare, but part and parcel of working with Gradle.

signing {
  storeFile = getKeyStoreFile()
  storePassword = getKeyStorePassword()
  keyAlias = getSigningKeyAlias()
  keyPassword = getSigningKeyPassword()
}

Lo and behold, my log entries appeared and I was back in business.

Fixing the Compiler

The compiler authors could have tried harder to avoid altering the semantics of the higher-level Groovy code when replacing it with Java.

One way would be to use more obfuscated auto-generated getters and setters (to the degree that Java even allows this, which I think it does).

Another way was hinted at above: when lowering calls that auto-resolve to functions declared in ext regions, include information about the resolution in the call made in Java. That is, instead of just encoding getKeyAlias() as I’d written it (which is semantically correct at the Groovy level), transform that call to rootProject.ext.getKeyAlias() in Java.

tl;dr

Gradle is a shaky piece of business that automagically generates code that might replace actual, legitimate calls in your own code. It should never have been used for a build system. It makes MSBuild seem like a pretty good idea.

[1] The code had somewhat haphazard names before, which actually ended up protecting it from the bug I ran into. Lesson learned: using obtuse function names is good.

[2] Yes, we’re going to integrate a secret store like Vault instead of relying on environment variables. That comes after this refactoring.

[3] In the end, I was happy I’d refactored everything because I ended up with a lot less code that was much more self-documenting. The job was to refactor and clean up scripts that had grown over the years, so there was no avoiding renaming oddly named methods. The side-effect in this case was unusually painful.

[4]

Gradle lets you declare “extra” variables in a scoped block called ext.

The parent of this block depends on the context. It’s usually rootProject, unless you’re executing project-specific code, in which case any declarations will be made in the sub-project-specific ext block instead of the one for the rootProject.

It can get quite confusing if you’re not sure which context you’re in when you declare an ext {} block, which is why some authors try to declare rootProject.ext or project.ext, but then you run into problems when you grab a variable from the wrong extra region.

Not to mention that it gets quite a bit messier to read and if all authors don’t stick to the same style, it becomes difficult to tell which explicit references are necessary and which are just thrown in there “to make sure”.

I settled on just declaring as much as possible in the ext {} and letting Groovy figure out which variable to use from scope. That ended up biting me in the ass exactly once, as detailed above.

C# 9: finally, covariant returns

2020-05-24T22:27:00+02:00

Published by marco on 24. May 2020 22:27:00 (GMT-5)

The article Welcome to C# 9.0 by Mads Torgersen (Microsoft Dev Blogs) (May 2020) introduces several nifty new features that I am really looking forward to using.

What about C# 8?

I still haven’t moved Quino to C# 8, as the only feature I’d love to have there is the non-nullable types, which ReSharper Annotations provide with earlier versions of C#. Not only that, but the nullabilities are properly propagated to users of Quino. It’s understood that recent versions of Visual Studio and runtimes and compilers also do this but, until recently, our customers weren’t up-to-date yet.

In C# 8, we could also replace extension methods with default interface methods—but we’ve also been replacing almost all extension methods in Quino with singletons and composition anyway. A lot of the rest of the features are nice, and interesting, but they are targeted optimizations that don’t really apply to a lot of the code that I write. I see how they are eminently useful for lower-level library and runtime optimization—many are clearly made to be able to handle web requests and fine-grained tasks more quickly and without allocation

Features in C# 9

Still, the features in C# 9 make an upgrade even more attractive.

Init-Only Properties can only be initialized in the object initializer, after which they are immutable. This extends read-only properties, which can only be initialized to default values or in the constructor to make them much more useful and allow many, many more data structures to be immutable.
Records reduce a ton of boilerplate for what used to be referred to as DTOs. The declaration support is very similar to the syntax in Typescript. Record classes automatically get construction, deconstruction, and value-based equality and hash-code support for a very natural way of declaring and working with immutable data.
The with keyword and functionality allows code to easily derive new data from existing data (e.g. var originalPerson = otherPerson with { LastName = “Hunter” };)
There’s some neat improvements to pattern-matching with relational patterns and logical patterns, but I honestly don’t expect to use that too much (my use of advanced pattern-matching so far has been relatively limited…I haven’t even availed myself of the extended support in C# 8).
There are improvements to target-typing with some coalescing expressions now compiling as expected rather than requiring what always felt like a superfluous cast. I doubt I’ll be using target-typing like this: Point p = new (3, 5); rather than this: var p = new Point(3, 5);
And, finally, covariant return types make an appearance. Java has had these for forever and there is no logical downside to introducing them.

This allows a descendant method to change the return type of an override to a descendant as well. The most common use case would for the return type of a Clone() method. The next step would be to allow anchored types (as in Eiffel), which would let a method declare its return type as like this and remove the requirement that each descendant override Clone at all, while still having the desired covariant return type.

I’ve been musing about these features for what feels like most of my career.
- Static-typing for languages with covariant parameters (June 2003—as anchored types)
- Joel “Blowhard” Spolsky Sounds Off (January 2005)
- Wildcard Generics (November 2006—in Java, not C#)
- Waiting for C# 4.0: A casting problem in C# 3.5 (October 2009—footnote linking to PDF about anchored types)
- A provably safe parallel language extension for C# (February 2013)
- C# 6 Features and C# 7 Design Notes (March 2015)

W3C Web Animations

2020-04-13T11:20:34+02:00

Published by marco on 13. Apr 2020 11:20:34 (GMT-5)

Updated by marco on 15. Apr 2020 15:47:26 (GMT-5)

The Web Animations Working Draft (W3C) was published in October of 2018. Can I use “Web Animations” (CanIUse) shows that the only browser that supports this API 100% is the latest technology preview on iOS and MacOS. Chromium-based browsers have had (very) basic support for quite some time, but Safari has thrown down the gauntlet with full support, which I learned about from Web Animations in Safari 13.1 by Antoine Quint (WebKit Blog).

This API is intended to replace many usages of CSS Animations and CSS Transitions, which are not only somewhat verbose and unwieldy for even simple cases, but are also not efficient in that each animation tends to force itself to start, artificially interrupting the browser as it prepares a page. With the Web Animations API, a page can much more declaratively indicate its intent without force-calculating animation target values, as is required now with CSS Animations.

A page can create and launch animations, but it can also get a reference to that animation and change it on-the-fly afterward. You can play it, pause it, change the play position, the play state, hook into the animation lifecycle with a Promise-based API, and much more. A page can even get all of the animations associated with an element or the entire document and manipulate them wholesale. Safari’s new inspector uses this API to offer much richer display and control of all running animations. Understandably, Safari has reimplemented CSS Animations and CSS Transitions on top of a whole new animation engine that the Web Animations API also controls.

Safari puts a very strong implementation forward, with only two features missing:

SVG Animations are not running on the new engine yet and cannot be manipulated with the API
Effect composition (W3C) is not yet supported

PostgreSql Drawbacks

2020-04-13T11:18:02+02:00

Published by marco on 13. Apr 2020 11:18:02 (GMT-5)

Despite the title, from what I can gather from 10 Things I Hate About PostgreSQL by Rick Branson (Medium), the author is a big fan of PostgreSql. However, he has such vast experience with it that he can still list 10 things that don’t work as well as they could.

They seem to boil down to:

Default replication is still serialized and therefore not as reliable as the alternative async protocol that is much harder to configure and pretty much what every other major database offers.
Obsolete-data-handling requires regular and fine-tuned (for performance-critical) vacuuming instead of just handling it in a cleaner manner like MySQL, Oracle or SQL Server (rollback log or temp tables)
Indexes use more space than they should because they copy the indexed fields rather than referring directly to the data in the table. That is, unlike most other DBs, the index data is physically separate from the table data and therefore can nearly double the required space for a table with few columns.
The plan-builder doesn’t support planning hints, which means you can’t patch a query in production to buy time: you have to either meta-patch it (i.e. figure out some way of sending a “hint” to the planner through other means) or fix it for real, which can take a lot more time while your production servers are blowing up. From the article,

“I do understand their reasoning, which largely is about preventing users from attacking problems using query hints that should be fixed by writing proper queries. However, this philosophy seems brutally paternalistic when you’re watching a production database spiral into a full meltdown under a sudden and unexpected query plan shift. (Emphasis in original.)”

Getting started with D3.js

2020-03-21T18:37:55+01:00

Published by marco on 21. Mar 2020 18:37:55 (GMT-5)

Updated by marco on 15. Apr 2020 15:50:21 (GMT-5)

The programmable notebook Introduction to D3 by Arvind Satyanarayan (MIT Visualization Group) is part of a full course at MIT about Interactive Data Visualization.

The linked notebook uses D3.js, but previous classes in the course have dealt with Vega, which is,

“[…] a visualization grammar, a declarative language for creating, saving, and sharing interactive visualization designs. With Vega, you can describe the visual appearance and interactive behavior of a visualization in a JSON format, and generate web-based views using Canvas or SVG.”

Vega is a higher-level abstraction than D3 and is, therefore, both more powerful and more limited than it.

If what you want to build fits the higher-level building blocks of Vega (see examples), then you’ll be done more quickly with that; if it doesn’t, then D3.js offers more flexibility as it functions at finer granularity.

“[…] grammars [like Vega] break visualization design down into a process of specifying mappings (or visual encodings) between data fields and the properties of graphical objects called marks. They’re useful for concisely and rapidly creating recognizable visualizations, while giving us more design flexibility (or expressivity) than chart typologies like Microsoft Excel.

“However, describing visualization design in these high-level terms limits the types of visualizations we can create. For example, we can only use the available marks, and can only bind data to supported encoding channels.”

With D3.js, you have to do a bit more legwork yourself, but it offers more graphical flexibility and possibilities. Instead of customizing the settings for predefined renderers (or “marks”), you define the renderers yourself: the notebook includes examples in HTML and SVG. To keep things simple, the SVG examples replicate the HTML examples, but they could render much more that is not so easy to realize in HTML.

Although D3.js has a reputation as a “charting library”, that moniker is actually more appropriate for Vega. D3.js is a generalized data-to-graphics mapping library. As you can see from the examples, it is very useful for charts, but allows a lot more customizability than Vega. Anyone building charts for their site should consider very carefully whether the additional power and complexity are warranted vs. a solution with something like Vega.

That said, it was a lot of fun getting to know D3 with this notebook. The notebook is extremely well-written and organized and it’s absolutely fantastic that it’s available online, for free. I was able to understand and execute all of the exercises and feel like I have a good enough grasp of D3 now to be able to build something with it. Perhaps more importantly, I feel that I can now:

Determine whether D3 or Vega are appropriate for a given project
Estimate the amount of time required to build something with either of them

Rust: from zero to pretty-well-versed in 30 minutes

2020-03-21T15:59:18+01:00

Published by marco on 21. Mar 2020 15:59:18 (GMT-5)

I found the article A half-hour to learn Rust by Amos to be extremely helpful in learning the syntax and mechanics of Rust.

It starts out with the absolute basics:

“let introduces a variable binding […]”

then takes you through

Modules
Blocks
Conditionals
Matches
Options
mutables
Copy/clone semantics
Traits
Generic parameters
Constraints
Macros
Enums
Lifetimes and borrowing
Generic lifetimes
Statics vs. owned vs. referenced
Slices and range literals (Index and IndexMut)
Results
Errors, panic and unwrap, expect() and ?
Closures (Fn, FnMut, and FnOnce)
move
for … in

and ends up with a function builder that tests strings:

fn make_tester<'a>(answer: &'a str) -> impl Fn(&str) -> bool + 'a {
    move |challenge| {
        challenge == answer
    }
}

fn main() {
    let test = make_tester("hunter2");
    println!("{}", test("*******"));
    println!("{}", test("hunter2"));
}

// output:
// false
// true

Quino 2020 Roadmap

2020-03-07T18:43:04+01:00

Published by marco on 7. Mar 2020 18:43:04 (GMT-5)

Updated by marco on 8. Mar 2020 10:58:06 (GMT-5)

Now that Quino 8.x is out the door, we can look forward to Quino 9.

Quino 8 is a very solid and stable release that has already been test-integrated into many of our current products running on Quino. We don’t anticipate any more low-level API changes, though there will be follow-up bug-fix releases.

There are a few larger-scale changes improvements and enhancement, outlined below (and noted in the roadmap).

Ready for Change

With this release, we’ve got more coverage than ever. Excluding only generated code (e.g. *Metadata.cs and *.Class.css in the model assemblies), we ended up with a respectable 81% test coverage. Quino has almost 10,000 tests comprising about 51k LOC and covering 82k LOC [1], Many, many of these are integration and scenario tests. With this level of test coverage, we feel comfortable with refactoring to improve usability and performance.

81% Coverage for Quino 8

Documentation

One of the primary near-term goals is to improve Quino’s documentation story. The aim is to take a new developer through the common tasks of working with a solution based on Quino.

Setting up a solution: getting packages and creating standard projects for testing, web, and so on
Creating a model
Using expressions
Using quino’s command-line tools
Generating ORM and metadata code
Working with business logic
Integrating the web
Improving the debugging experience
Writing tests efficiently, quickly and easily using powerful base classes and tools for snapshot-based tests

Some of this documentation is currently still out-of-date or will change as we improve the corresponding components. For example:

Anything that refers to Nant is no longer relevant
Some of the quino tool documentation will no longer be relevant after 8.1 (see tools-related issue in the issue tracker)
The data documentation is still very much a work-in-progress

Search and Index

The latest table of contents is much more comprehensive than before and we’re still improving it.

We don’t have an integrated search for the conceptual documentation yet, but you can use Google’s site-specific search. For example, search for configuration with the following search text “configuration site:docs.encodo.ch”. The top results are:

Which is pretty decent, overall.

Database-first

Several of our upcoming products using Quino (two are so new that they’re not yet listed) are replacing legacy products that are highly dependent on a central database that defines the application domain. That is, the model is in the database or in a model description that is not initiallly a Quino model.

Instead of defining the model in C# code manually and then building the database from that (the standard approach with Quino), these products define the model with varying levels of automation and import and then use the existing database.

The following list shows the various ways that we’re building Quino models, in addition to the standard approach of defining them in C#:

Import a model from Microsoft Dynamics CRM
Import a model from a legacy DSL like the Atlas modeling language
Import a model from the database schema itself, using Quino’s schema-import/migration support
Map a model in C# to an existing database schema, using Quino’s new support for mapping schemas

This allows customers with existing databases to relatively quickly and easily produce a Quino model that gets them access to the plethora of features available to Quino applications (e.g. ORM, schema-check and -migration, generated GUI for desktop or web, and so on).

[1]

The LOC analyzer included in Visual Studio had slightly different numbers:

297k LOC (source)
198k LOC (docs)
202k LOC (source/libs)
99k LOC (executable)
56k LOC (executable/libs)
43k LOC (executable/tests)

Quino has almost one line of testing code per line of library code (43k/56k ~ 77%). Quino has almost 4 lines of non-executable code per line of executable library code (202k/56k ~ 360%).

The disparity between the two results (JetBrains DotCover and Microsoft Visual Studio) just goes to show what a fraught metric LOC really is. According to these two measurements, Quino has between 56k and 83k LOC of executable library code.

Quino v8.0.0: ASP.NET Core, Web Client 2, Culture/Language improvements

2020-02-22T17:43:38+01:00

Published by marco on 22. Feb 2020 17:43:38 (GMT-5)

The summary below describes major new features, items of note and breaking changes.

Resources
- Artifacts (Note: the URL is a NuGet Source; you can’t browse here directly)
- Documentation
Issues/Changelog

The links above require a login.

Highlights

Quino-Web now targets ASP.NET Core (QNOWEB-149, QNOWEB-84, QNOWEB-55)
Quino-Web has Serilog enabled by default (QNOWEB-147, QNOWEB-146, QNOWEB-145, QNOWEB-139)
Enabled SourceLink for all packages (QNOWIN-262)
Improved testing and debugging support. (QNO-6289, QNO-6282, QNO-6278, QNO-6277, QNO-6275, QNO-6255, QNO-6213)
Improved culture and language-handling (QNO-6302, QNO-6303, QNO-6253, QNO-6230, QNO-6228)
Extended expression functions with CreateGuid(), CreateDate(), and CreateTime(). (QNO-6304, QNO-6305)

Breaking Changes

Before upgrading, products should make sure that they do not depend on any obsolete members in the current version (7.x).

ASP.NET Core

Quino-Web 8.0 is a rewrite and is therefore mostly incompatible with 7.x.

The controller returns data in a completely different format
The Quino Client has been completely rewritten to accommodate it
The startup and pipeline have been completely rewritten to integrate with ASP.NET Core
Testing support has been considerably extended to accommodate end-to-end integration testing and in-process hosts

See the Quino-Web/Sandbox.Web project for a working example. This integrates the standard SandboxApplication into a web site using the standard GenericController and MetadataController to provide data and UI to the generic Quino Client.

Namespace Changes

Some internal types in Quino-Standard have been moved to more appropriate namespaces and assemblies, but the impact on products should be non-existent or very limited.

The following types were moved from Encodo.Quino.Core to Encodo.Quino.Culture:

LanguageTextAttribute
IValueParser
CaptionAttribute
LanguageDescriptionAttribute

The following types were moved from Encodo.Quino.Core to Encodo.Quino.TextFormatting:

* IFileSizeFormatter

Culture- and Language-Handling

Quino’s default culture-handling has been overhauled. Instead of tracking its own language, Quino now uses the standard .NET CultureInfo.CurrentUICulture for the default language and CultureInfo.CurrentCulture for default formatting (e.g. times, dates, and currencies). Many fields have been marked as obsolete and are no longer used by Quino.

Default Languages

The default languages in Quino have changed from “en-US” and “de-CH” to “en and “de”, respectively.

The reasoning behind this is that, while a _requested language_ should be as specific as possible, a _supported language_ should be as general as possible. The standard culture mechanisms and behavior (e.g. .NET Resources) “fall back” to a parent language when a more-specific language cannot be found. If an application claims to only support “en-US”, then a request for “en-GB” fails. If the supported language is “en”, then any request to a language in the “en” family (e.g. “en-US”, “en-GB”, “en-AU”) will use “en”.

An application that supports “en-US” and “de-CH” has, therefore, a more limited palette of languages that it can support.

Fallback-resolution

Quino code runs in the context of a user, who has a list of preferred languages, in decreasing order of preference. This context can last the entire duration of an application (e.g. a standalone application like a console or desktop application) or last as long as a web request.

The application itself has a list of languages that it supports, as well as resources and metadata that defines text in these languages. The resources are standard .NET Resources with the standard fallback mechanism (i.e. a request for “en-US” can be satisfied by “en”). The metadata uses DynamicString objects, which encapsulate a map from language codes (e.g. “en” or “de”) to strings.

During application startup or at the beginning of a web request, the ILanguageResolver determines the language to use for a given set of requested languages. In ASP.NET Core, the requested languages come from the HTTP headers provided by the browser. In standalone applications, the IRequestedLanguageCalculator provides the requested languages. The ILanguageInitializer is responsible for coordinating this during application startup.

The rest of Quino uses the following singletons to work with languages.

IDynamicStringFallbackCalculator: Comes into play when a request is made for a language that is not directly supported. For example, if the application supports “en” and “de”, then a request for “en-US” will ask this singleton how to resolve the request.
IDynamicStringFactory: Creates a dynamic string to describe a given object. The default implementation uses .NET Attributes.
ILanguageResolver: Determines the culture to use from a list of available cultures and a list of requested/preferred cultures.
IRequestedLanguageCalculator: Provides the sequence of languages from which to choose during initial resolution (web requests _do not_ use this).
ILanguageInitializer: Integrates language-selection into the application startup.
ICaptionCalculator: Extracts a single caption for a culture from a given object. Appications should use the IDynamicStringFactory in most cases, instead.

An application can control fallback by registering custom IDynamicStringFallbackCalculator and ILanguageResolver implementations (though this is almost certainly not necessary).

Opting in or out

Any product that calls AddEnglishAndGerman() will automatically be upgraded as well. A product can avoid this change by calling AddAmericanEnglishAndSwissGerman() instead.

Reports

A product that uses the new languages will have to replace all fields in reports targeted at “en-US” and “de-CH” to target “en” and “de” instead.

Database Fields

A product that does use the new default languages will have to determine how to migrate database fields created for languages that are no longer explicitly supported. If the model includes value-lists (enums) or multi-language properties , the application will have to migrate the database schema to update multi-language fields (e.g. “caption_en_us” => “caption_en”).

Manual MetaIds

A product that sets MetaIds manually will migrate without modification (Quino will rename the property in the database).

Automatic MetaIds

A product that does _not_ set MetaIds (this has been the default in Quino since version 2) will have a MetaID mismatch because the name has changed.

By default, Quino will migrate by attempting to drop, then re-create multi-language properties. In the case of value-list captions, this is harmless (since the data stored in these tables are generated wholly from the metadata). For actual multi-language properties with user data in them, this is _a problem_.

The simple solution is to call UseLegacyLanguageMappingFinalizerBuilder() during application configuration to ensure a smooth migration (Quino will rename the property in the database).

Regenerating Code

A product that updates its languages should regenerate code to update any generated language-specific properties. Properties that had previously been generated as, e.g. Caption_en_us will now be Caption_en.

When [NotNull] is null

2020-02-18T09:08:08+01:00

Published by marco on 18. Feb 2020 09:08:08 (GMT-5)

I prefer to be very explicit about nullability of references, wherever possible. Happily, most modern languages support this feature non-nullable references natively (e.g. TypeScript, Swift, Rust, Kotlin).

As of version 8, C# also supports non-nullable references, but we haven’t migrated to using that enforcement yet. Instead, we’ve used the JetBrains nullability annotations for years. [1]

Recently, I ended up with code that returned a null even though R# was convinced that the value could never be null.

The following code looks like it could never produce a null value, but somehow it does.

[NotNull] // The R# checker will verify that the method does not return null
public DynamicString GetCaption()
{
  var result = GetDynamic() ?? GetString() ?? new DynamicString();
}

[CanBeNull]
private DynamicString GetDynamic() { … }

[CanBeNull]
private string GetString() { … }

So, here we have a method GetCaption() whose result can never be null. It calls two methods that may return null, but then ensures that its own result can never be null by creating a new object if neither of those methods produces a string. The nullability checker in ReSharper is understandably happy with this.

At runtime, though, a call to GetCaption() was returning null. How can this be?

The Culprit: An Implicit Operator

There is a bit of code missing that explains everything. A DynamicString declares implicit operators that allow the compiler to convert objects of that type to and from a string.

public class DynamicString
{
  // …Other stuff

  [CanBeNull]
  public static implicit operator string([CanBeNull] DynamicString dynamicString) => dynamicString?.Value;
}

A DynamicString contains zero or more key/value pairs mapping a language code (e.g. “en”) to a value. If the object has no translations, then it is equivalent to null when converted to a string. Therefore, a null or empty DynamicString converts to null.

If we look at the original call, the compiler does the following:

The call to GetDynamic() sets the type of the expression to DynamicString.
The compiler can only apply the ?? operator if both sides are of the same type; otherwise, the code is in error.
Since DynamicString can be coerced to string, the compiler decides on string for the type of the first coalesced expression.
The next coalesce operator (??) triggers the same logic, coercing the right half (DynamicString) to the type it has in common with the left half (string, from before).
Since the type of the expression must be string in the end, even if we fall back to the new DynamicString(), it is coerced to a string and thus, null.

Essentially, what the compiler builds is:

var result = 
  (string)GetDynamic() ?? 
  GetString() ?? 
  (string)new DynamicString();

The R# nullability checker sees only that the final argument in the expression is a new expression and determines that the [NotNull] constraint has been satisfied. The compiler, on the other hand, executes the final cast to string, converting the empty DynamicString to null.

The Fix: Avoid Implicit `DynamicString`-to-`string` Conversion

To fix this issue, I avoided the ?? coalescing operator. Instead, I rewrote the code to return DynamicString wherever possible and to implicitly convert from string to DynamicString, where necessary (instead of in the other direction).

public DynamicString GetCaption()
{
  var d = GetDynamic();
  if (d != null)
  {
    return d;
  }

  var s = GetString();
  if (s != null)
  {
    return s; // Implicit conversion to DynamicString
  }

  return GetDefault();
}

Conclusion

The takeaway? Use features like implicit operators sparingly and only where absolutely necessary. A good rule of thumb is to define such operators only for structs which are values and can never be null.

I think the convenience of being able to use a DynamicString as a string outweighs the drawbacks in this case, but YMMV.

[1] Java also has @NonNull and @Nullable annotations, although it’s unclear which standard you’re supposed to use. (StackOverflow)

Configuring .NET Framework Assembly-binding Redirects

2020-01-30T22:30:05+01:00

Published by marco on 30. Jan 2020 22:30:05 (GMT-5)

Updated by marco on 30. Jan 2020 22:30:51 (GMT-5)

After years of getting incrementally better at fixing binding redirects, I’ve finally taken the time to document my methodology for figuring out what to put into app.config or web.config files.

The method described below works: when you get an exception because the runtime gets an unexpected version of an assembly—e.g. “The located assembly’s manifest definition does not match the assembly reference”—this technique lets you formulate a binding-redirect that will fix it. You’ll then move on to the next binding issue, until you’ve taken care of them all and your code runs again.

Automatic Binding Redirects

If you have an executable, you can usually get Visual Studio (or MSBuild) to regenerate your binding redirects for you. Just delete them all out of the app.config or web.config and Rebuild All. You should see a warning appear that you can double-click to generate binding redirects.

If, however, this doesn’t work, then you’re on your own for discovering which version you actually have in your application. You need to know the version or you can’t write the redirect. You can’t just take any number: it has to match exactly.

Testing Assemblies

Where the automatic generation of binding redirects doesn’t work is for unit-test assemblies.

My most recent experience was when I upgraded Quino-Windows to use the latest Quino-Standard. The Quino-Windows test assemblies were suddenly no longer able to load the PostgreSql driver. The Quino.Data.PostgreSql assembly targets .NET Standard 2.0. The testing assemblies in Quino-Windows target .NET Framework.

After the latest upgrade, many tests failed with the following error message:

Could not load file or assembly ‘System.Runtime.CompilerServices.Unsafe, Version=4.0.4.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’ or one of its dependencies. The located assembly’s manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)

This is the version that it was looking for. It will either be the version required by the loading assembly (npgsql in this case) or the version already specified in the app.config (that is almost certainly out of date).

Which File Was Loaded?

To find out the file version that your application actually uses, you have to figure out which assembly .NET loaded. A good first place to look is in the output folder for your executable assembly (the testing assembly in this case).

If, for whatever reason, you can’t find the assembly in the output folder—or it’s not clear which file is being loaded—you can tease the information out of the exception itself.

From the exception settings, make sure that the debugger will stop on a System.IO.FileLoadException
Debug your test
The debugger should break on the exception

Assembly-binding exception

Click “View Details” to show the QuickWatch window for the exception. There’s a property called FusionLog that contains more information.

Fusion log details (with loaded assembly path)

The log is quite detailed and shows you the configuration file that was used to calculate the redirect as well as the file that it loaded.

Assembly-binding 'fusion log'

Which Version Is It?

With the path to the assembly in hand, it’s time to get the assembly version.

Showing the file properties will most likely not show you the assembly version. For third-party assemblies (e.g. Quino), the file version is often the same as the assembly version (for pre-release versions, it’s not). However, Microsoft loves to use a different file version than the assembly version. That means that you have to open the assembly in a tool that can dig that version out of the assembly manifest.

The easiest way to get the version number is to use the free tool JetBrains DotPeek or use the AssemblyExplorer in JetBrains ReSharper or JetBrains Rider.

You can see the three assemblies that I had to track down in the following screenshot.

Actual versions of various System assemblies

Writing Binding Redirects

Armed with the actual versions and the public key-tokens, I was ready to create the app.config file for my testing assembly.

Assembly-binding mappings

And here it is in text/code form:

Rebase Considered Essential

2020-01-02T10:41:06+01:00

Published by marco on 2. Jan 2020 10:41:06 (GMT-5)

Fossil is a distributed Source Control Manager that claims to offer the same power without the complexity of Git. The article Fossil: Rebase Considered Harmful by D. Richard Hipp (Fossil SCM) is part of the documentation for the tool.

One of the main selling points of Fossil is that it does not support rebase. In the article, the author lays out the many ways in which rebasing causes no end of woes for developers using Git.

I’d heard of Fossil before and I’d even skimmed this document before. This time around, though, I read it through to learn the author’s reasoning. My short take is: I do not want to use an SCM that does not allow rebase. [1] I think a project benefits greatly in clarity if a developer is able to alter the local history before cementing commits into an unalterable history (i.e. pushing to the server).

Terminology and Concepts

The following definitions are not complete, but are sufficient for the ensuing discussion.

A repository describes the history of a set of data
A commit includes instructions for how to change the state of the repository
A branch points to a commit, but generally refers to a set of commits; a repository may contains multiple, independent branches
A merge operation integrates two branches with a merge commit that describes the delta; it retains all commits from both branches
A rebase operation integrates two branches by re-applying each commit since the branches diverged from one branch to the other, possibly changing the original commits; it replaces the commits from one of the branches with new commits

A rebase is considered a destructive operation because it discards part of the history of the repository by rewriting commits.

If I think about it, though, many of the operations I’m accustomed to making are destructive:

Editing the commit message
Amending a commit with more changes
Squashing commits
Re-ordering commits

All of these operations are considered destructive because they modify the “true” history of the repository. But what do we mean by “true” history? Where does the story start?

The changes outlined above are not for sharing. It’s not interesting to the final reader that I had to backspace through and re-spell the word “outlined” in the previous sentence. It might be interesting to see different drafts, though, to see how I arrived at the final version. But those changes are at a different level of granularity.

Who decides where one level of granularity stops and the next begins? I think it’s the author of the commits. My workflow over the last ten years is based heavily on being able to massage commits so that I can prepare what I share to the server repository, where it can no longer be changed. I agree that there should be an unalterable history, but disagree with the author on where that history begins.

Hairball Commits

I agree with the author that developers should not work in silos, massaging their code until it is perfect, pushing only once there are no more errors and no-one could possibly take issue with anything in the feature. At this point, the author purports that many developers squash all of their local commits to a single so-called hairball commit that makes it look like the code sprung from the forehead of the developer as Eve sprung from Adam’s rib: whole and without blemish.

Hairball commits are acknowledged as bad, so attacking them as the prime reason to eliminate the tool that allows them seems to be more of a straw man.

Preventing developers from making any changes to local commits is not the way to solve the problem, though. While Fossil does not allow discarding any single commit from the history, the author acknowledges that Fossil allows developers to apply addenda that the common Fossil tools will show while hiding the original commits. [2]

I see the author’s point—that (potentially) important parts of a history are retained whether the developer wants them or not. That is, it is not up to the developer to decide, but up to the archeologist examining the commits later. This is an interesting idea, but the argument is ultimately not convincing.

Let’s suppose a developer uses an SCM without rebase. Either there will be many commits in the history that—unlike the author claims—do not provide any clarity because they are garbage commits (e.g. WIP and other sorts of investigatory commits that were quickly reverted or undone). Or, the developer will be terrified of making a commit before it’s ready and runs the risk of losing work or working less efficiently.

Developers will not magically become ego-less and kowtow to the machine. Instead, they will pick up bad habits that are worse than local rebasing. They will keep work uncommitted for too long or will fail to split up commits properly because they are afraid that they can’t fix them up later. In either case, it’s chaos in the commit history and the project efficiency and reliability suffers.

But the author is arguing with a straw man that doesn’t really exist outside of shitty developer teams with undisciplined developers. One can argue that these are the kind of developers that many projects have, but that can only be addressed with process. Weakening the tools so that disciplined developers are less efficient is a bad idea.

You don’t like hairball commits? Tell developers to stop making them. Enforce the policy with reviews. The Git documentation already encourages developers to make focused commits. Rebase allows a developer to split up commits during or after a code review. Rebase can actually be used to combat hairball commits.

I have personally used it to split up commits that inadvertently mixed a bug fix or two into a large pile of refactoring changes. I’ve also often advised people to redesign their commits so that they tell a better story.

Citations and Responses

I’ve interspersed citations from the document linked above and included responses and thoughts.

“[…] [some tools] accomplish things that cannot be done otherwise, or at least cannot be done easily. Rebase does not fall into that category, because it provides no new capabilities. (Emphasis added.)”

As discussed above, I think that this is fundamentally wrong. My workflow is considerably different than it was before I used Git or had access to rebase. I would now be much less efficient if I didn’t have rebase. It would make me constantly focus on cleaning up commits before I really care to. You could make the argument that cleaning up afterward takes more time, but I haven’t experienced that to be the case. Instead, I want to be able to set the priorities rather than worry about committing something that I cannot undo.

Telling a good story

And it’s not about ego or “looking stupid” to future readers of the history; instead, it’s about having control of the story you tell to those same readers. If you don’t have rebase, then you tell just as poor a story as if you use rebase badly. It’s perhaps closer to the “true” story, but it’s not the “best” story. Without rebase, you’re forcing future archeologists of your code to read all drafts as well as the final version simultaneously.

At Encodo, we don’t focus on ego, we focus on efficiency. We do not obliterate commits that make sense just to squash a whole feature. We retain commits in order to tell a good story about how a feature was built. We do not emphasize being able to build each commit: often we’ll add a failing test in one commit, then fix the bug in another commit, because that tells a better story.

We need rebase in order to massage local commits so that they tell this good story rather than uploading dozens of commits that no-one should ever have to look at (typos, code comments, formatting, etc.). Often, we’ll squash in little fixes and changes that come up during a review. Is the Fossil author suggesting that there is some benefit to seeing these in a separate commit? It would make understanding the commits at a later time that much harder.

I think most of the author’s concerns are addressed by using review and process to enforce better commits. Fossil can’t make this happen because the developers have to create good commits in the first place or, at least, eventually. Rebase helps better developers clean up their own commits and also helps them help others clean up their commits, teaching them how to tell the story of their code.

“A rebase is just a merge with historical references omitted”

Exactly. If I can’t eliminate WIP commits or squash local commits, then my workflow changes. Honestly what’s the point of keeping each commit? Many are scribbles, unwanted drafts. They’re not part of a history anyone would retell. Once commits are cleaned up and tell a good story, there is no need to keep the old commits around. At that point, you’re wasting the future archeologist’s time.

“Surely a better approach is to record the complete ancestry of every check-in but then fix the tool to show a “clean” history in those instances where a simplified display is desirable and edifying, but retain the option to show the real, complete, messy history for cases where detail and accuracy are more important.”

This feature is an interesting one for commits that can no longer be changed (i.e. have already been pushed), but why make the developer mark every accident and mistake instead of just letting him undo them? The “full” view would be of marginal to no value. Even once the messy commits were deciphered, they would most likely yield no useful information.

What possible benefit is it to keep a jungle of “fix typo” and “add missing file” or “fix broken test” commits just because the developer made a commit before running tests or seeing a warning in the IDE? [3]

Command Line vs. UI

“So, another way of thinking about rebase is that it is a kind of merge that intentionally forgets some details in order to not overwhelm the weak history display mechanisms available in Git.”

I honestly think that this guy just wants to make Git look stupid and Fossil look spectacular. I understand fully that it’s silly to argue that Git doesn’t need a feature that Fossil has just because I’ve personally never needed it. A good feature is something that becomes essential once you have it, but you never knew you were missing it or were less efficient without it. Fossil’s ability to easily see which changes were made to a file after a given commit sounds like it might be that kind of feature. However, rebase in Git is such a feature, so if Fossil takes that away, it’s a deal-breaker.

At this point, I think also that the author is considering Git as a command-line application rather than extended by a truly powerful UI like SmartGit, which provides fast access to gobs of historical data with little effort.

When does Siloed Development begin?

“Or, to put it another way, you are doing siloed development. You are not sharing your intermediate work with collaborators. This is not good for product quality.”

What has this guy seen in the wild that he’s reacting this way? Who hurt this poor man? How often does he expect us all to commit and push to the server? Should we code directly on the server? Where does he draw the line for “siloed” work? A day? An hour?

More to the point: who is paying developers (or a project lead) to examine unvetted commits? Do you think we’re made out of free time? Keeping everything around forever is not the most efficient way of optimizing information about your code. It’s a hoarder mentality.

I understand the sentiment: you want to avoid people massaging commits into oblivion, eliminating important information. But, honestly, I’ve seen the opposite problem: commits pushed to the server in the shabbiest form, thereafter unalterable. [4]

“Many developers are drawn to private branches out of sense of ego. “I want to get the code right before I publish it.””

No, that is not my requirement. I want an efficient review that pinpoints (and fixes) errors quickly so no-one wastes time.

Online Repository Tools

The author claims that,

“Rebase adds new check-ins to the blockchain without giving the operator an opportunity to test and verify those check-ins. Just because the underlying three-way merge had no conflict does not mean that the resulting code actually works. Thus, rebase runs the very real risk of adding non-functional check-ins to the permanent record.”

This is true only for the special case of online merges. These should be avoided like the plague, in any case. I know that people really, really trust their tools. I know that they think that merges are infallible, that their CI builds their software and runs their tests and gives their pull request a green flag and a thumbs-up.

But anything other than a trivial pull request should be examined with tools more capable than online repository managers. Not only are they not as good, they are wildly inefficient when compared to a good desktop tool. I know this next generation of developers want to do everything on their phones, but this is ridiculous. The screen is too small and the tools are too limited.

Get a machine with usable screen real estate and learn what being efficient really means. Not only will you be quicker, you’ll be better: your error rate will decrease and you’ll see connections in the commits much better than with the (comparatively) meager online tools. I’ve written before about one such UI, SmartGit, in Git: Managing local commits and branches and Using Git efficiently: SmartGit + BeyondCompare.

Other online tools have similar weaknesses versus their desktop brethren: for example, text editors like Word or Google Docs. It’s definitely a killer feature that they’re online, but their only selling point is that they’re attached to an online document storage. That’s the selling point. As amazing as it is that these tools run in a browser, they are pathetic compared to tools from thirty years ago. My God, I fondly remember WriteNow 4.0 for Mac OS 6 and 7, which handled a 250-page document with aplomb, complete with figures, tables, TOC, numbering, custom styles, … all of those things that an editor should do. Somehow, just because it’s in the cloud means that we should be happy with WordPad instead of a full-fledged editor. It’s a joke.

Where does lying begin?

The author claims that,

“Rebasing is the same as lying By discarding parentage information, rebase attempts to deceive the reader about how the code actually came together.”

Then you should include all command/undo buffers from your IDE, too. At this point in the document, the author is just repeating the same argument over and over, reformulated but not different.

“Unless your project is a work of fiction, it is not a “story” but a “history.” Honorable writers adjust their narrative to fit history. Rebase adjusts history to fit the narrative.”

That’s not even how human history works. It’s not even how your own stories about your own life work. This is the kind of mentality that wants to keep all 6000 pictures from a vacation. Why? Just in case you need that picture of the ground that you took by accident? Because you need all 300 pictures of the Matterhorn? You’re wasting your readers’ time and your own.

“The intent is that development appear as though every feature were created in a single step: no multi-step evolution, no back-tracking, no false starts, no mistakes.”

Again, he proposes to fix a problem—poorly built commits—by not allowing anyone to modify commits.

“We believe it is easier to understand a line of code from the 10-line check-in it was a part of — and then to understand the surrounding check-ins as necessary — than it is to understand a 500-line check-in that collapses a whole branch’s worth of changes down to a single finished feature.”

I agree with this 100%. As already noted above, though, the review should disallow such foolish hairball commits.

“The more comments you have from a given developer on a given body of code, the more concise documentation you have of that developer’s thought process.”

Correct. But you don’t want to see everything. He presents a false choice between all the history and an improperly truncated version. Then he says he’d rather have all of it, and wants to get rid of history-rewriting. This doesn’t fix the problem of shitty programmers making shitty commits. The only way to fix that is gatekeeping reviews and process. Taking a vital tool for clarity (rebasing) away from disciplined programmers is a terrible idea.

“If we rebase each feature branch down into the development branch as a single check-in, pushing only the rebase check-in up to the parent repo, only that fix’s developer has the information locally to perform the cherry-pick of the fix onto the stable branch.”

He really seems to.be attacking a repo-management/history-editing process I’ve never used. It sounds horrid.

“Rebasing is an anti-pattern. It is dishonest. It deliberately omits historical information. It causes problems for collaboration. And it has no offsetting benefits.”

Only one of those sentences is true.

[1] Before I used Git, I used Perforce, which allowed a developer to keep changelists around without committing them. Once a changelist had been submitted to the server, it was frozen in the history. Until then, though, the developer could alter files and the commit message.

[2] Or so it sounded—I have not actually used Fossil.

[3] If you’re a SmartGit user, then you can see the trail of commits you’re leaving behind by selecting “Recyclable Commits” in the Log View. This shows all commits that will be reclaimed by the garbage collector the next time it runs.

[4] As noted above, Fossil seems to offer a feature with which you can “amend” commits and messages afterwards. I, however, want to be able to prevent bad commits from entering the stream in the first place.

Advanced CSS (blend modes and subgrids)

2019-12-28T23:23:06+01:00

Published by marco on 28. Dec 2019 23:23:06 (GMT-5)

Updated by marco on 28. Dec 2019 23:23:47 (GMT-5)

The article Z’s Still Not Dead Baby, Z’s Still Not Dead by Andy Clarke (24 Ways) is well-written, very interesting and taught me a few new CSS tricks of which I was unaware.

Granted, my work usually doesn’t call for fancy effects like those you can achieve with something like background-blend-mode, but it can happen. There’s not only background-blend-mode, there’s also mix-blend-mode and filter, all of which apply high-quality effects dynamically.

In the late spring, I had a two-month project where I had to use a lot of transformations and animations—and I was able to get it all done with CSS. Once you know about these kinds of techniques, you keep them in mind, and are able to consider solutions that would seem impossible (or very difficult/time-consuming/unmaintainable) if you didn’t know the technique.

A modern browser can construct the following image by composing and blending a couple of graphics.

Samples of Austin Seven 850 Ads

It’s actually pretty cool that you can get this type of layout with wide browser support and no hacks. See the linked article for a lot of examples.

I have used CSS Grid before (as the author does). The author mentions subgrids, but ends up using a second grid within the first grid because browser support for nested grids is good, whereas no-one supports subgrids except for the latest version of Firefox. The MDN documentation for Subgrids explains that it differs from nested grids in that

“If you set the value subgrid on grid-template-columns, grid-template-rows or both, instead of creating a new track listing the nested grid uses the tracks defined on the parent.”

The linked page includes many examples and more detail.

As with any advanced techniques, you have to take into account your own target browsers to see whether you can use them in your own projects. It’s a well-written article and I learned a few more techniques that I can hopefully use at some point.

Framework Design: Programming Paradigms and Idioms

2019-11-30T15:36:51+01:00

Published by marco on 30. Nov 2019 15:36:51 (GMT-5)

Updated by marco on 4. Oct 2023 21:28:45 (GMT-5)

The discussion React in concurrent mode: 2000 state-connected comps re-rendered at 60FPS (YCombinator) is illuminating mostly in that it shows how ego can impede productivity.

Ego-driven design

Ego can also be that thing that drives a talented programmer to create something of use to the rest of us, but that’s honestly a very rare case. More often than not, the best case is that a developer improves their skills—and perhaps learns to be more humble instead of shooting of their mouth about how “easy” it is to create a “good” product. Such claims are nearly always made without defining what they mean by “good”.

Some comments are from programmers more interested in a pissing contest of who can write performant code on their own. Their implementation often focuses laser-like on a specific use case not often found in nature without tackling the tough question of how to design a more generalized solution that incorporates and balances more than just the one aspect of the system that they think they’re good at (e.g. performance).

That is, they tend to carefully define the application domain based on what they’re already good at. This is not how product development works. Many of the commentators get distracted by the overreaching claims of the reposter (faster than any other WebGL rendering, which is patently not true) instead of reading the much more reasonable claims of Dan Abramov, who is the original poster.

Product-driven design

Thankfully, there are others who seem to understand that giving up a logical, declarative paradigm in order to do so is not an acceptable tradeoff in almost any given project. What are some facets other than performance that contribute to a good solution?

Maintainability
Extendability
Readability: low syntax noise
Testability: composition-based
Discoverability: consistent API
Flexibility: applicability to different problem domains
Expressiveness: concise and precise
Correctness: error-free is strongly encouraged or enforced
Completeness: degree to which the definition captures all facets of the problem domain
Learnability: low number of concepts cover application needs
Scalability: layered API allows more control where needed
Efficiency: ratio of effort to product low
High signal-to-noise ratio
Optimizability/Performance

Products that try optimizing all facets generally never see the light of day or serve as the base material from which more viable projects are born.

A higher level of abstraction is a good thing. It allows mediocre programmers (and be happy if you have even mediocre programmers) to write programs that aren’t a nightmare to maintain or refactor. It allows good developers to very quickly write maintainable programs. If the underlying framework has a declarative and easily understood paradigm that has only a handful of orthogonal concepts and it offers great performance by default, that’s a win. There are few projects that need spectacular performance as their main feature.

Focus on your application domain

I would argue that most web programming is about making line-of-business apps and pages where look and feel matters so much that it’s worth investing 50% more budget to get near-perfect and smooth updates. If it janks, it janks. There is no time or budget (or, sometimes, programming skill) to “fix” it. And, if “fixing” it means abandoning the high-level declarative programming model that makes working with Reactive so efficient, maintainable and productive, then that’s even more implicit cost bound up in it.

As the commentator Onion2k put it:

“This is a demo of good performance using a web framework on top of a WebGL framework. It’s showing that a future version of React will make building a solid 60fps web app UI […] within the reach of most web developers. Sure, you can hand-roll code to get that performance today if you know how, but this is about putting that performance in the hands of developers who can’t (or, more often, aren’t given the resources to). To argue that is unnecessary or actually bad is ridiculous. Libraries that make it easier to build better apps are universally good things. (Emphasis added.)”

Defining an application model

To use React, you have to make concessions to Reactive mode in your application definition. But that’s the way programming works. Instead of writing “a person must have a company, while the company has a possibly empty list of people”, we write (example from Quino),

Elements.Module.Company
  .AddOneToManyRelation(Elements.Module.Person)

Programming is all about explaining what an application does. The programming language and framework and runtime balance all of the factors listed above to be able to transform the formulation most accessible to a product owner (“I want a CRM”) through a business analyst (“It has a list of companies, each of which has a list of people”) to a programmer (formulation above).

The formulation above is still quite high-level, but satisfactory for 99% of cases. For the remaining 1%, the API has to provide some way of digging into the underpinnings of the implementation without dropping the developer off of a cliff. Quino does this reasonably well, as does React. The focus here is on realizing that a framework’s ability to accommodate that 1% of use cases smoothly is only one aspect of its effectiveness. Given that it doesn’t come up very much, it makes no sense to focus too much effort on optimizing that path, no matter how much more interesting it would be to the developers to do so.

Concurrent mode in React

This is one of those silly blogs-posted-as-tweets, but the points in Is Concurrent Mode just a workaround for “virtual DOM diffing” overhead? […] by Dan Abramov (Twitter) are good.

The point is that Concurrent Mode is not a speed improvement only for React. It also improves how your app’s code updates and is scheduled without you having to change your code (much, or at all). The linked article explains how this sea change in rendering components forms the basis of many other performance improvements that apply to existing applications without modification.

It’s exciting that a near-future version of React will make animations and updates even smoother than they are now. This taking into consideration that they are already more than good enough for most apps without tweaking.

React’s idioms

React is not a game-programming framework. It makes no sense to claim that React apps will blow away apps written in Unity. We make line-of-business apps with it. React already allows apps to have much better update characteristics with almost no code other than a few functional declarations to define rendering and components and the state that they rely on.

The model is unimpeachable in that it accurately reflects the application model without adding any ceremony.

You make some concessions in order to define your declarations about your program’s logic and states so that the framework can optimize as much as it can, but no more. With hooks, you can declare simple, mutable state or one-time, partially mutable state (memos and callbacks), listeners for lifecycle events (effects) and so on.

On the one hand, you’re forced to define your logic using React’s idioms but, on the other, they still make sense in that they make your assumptions about your app’s logic explicit rather than implicit. Once you’ve done this, the framework knows more about what it can optimize away and what it can’t. And you haven’t wasted time because you’re technically describing salient properties of your application domain.

Declaring Behavior

That’s the idea behind the < Suspense/> component: the app can declaratively determine how it would like components to be updated in different asynchronous situations involving multiple asynchronous tasks. Concurrent Mode allows the framework to work before that update is technically complete because it allows any work to be interrupted—and discarded, if it is no longer relevant.

This allows the reconciliation to benefit a bit from something like the branch predictor in a CPU, where speculative branches are executed in parallel and occasionally discarded. JavaScript imposes a cooperative rather than parallel model, but low-level support for interruptibility (especially when automatically applied) is worlds better than nothing.

Working within a paradigm

Any language—and the combination of the underlying programming language and the framework API, combined, is the language a programmer uses—must have a shape, a paradigm that it enforces. Naturally, a programmer can use a different paradigm than the recommended one. But a good framework finds the balance between a paradigm that is comfortable for a large part of its audience and one that is enough of an abstraction that it has a lot of leeway for applying to the next layers down (until it gets to machine code).

A good framework provides an out-of-the-box experience that provides a clearer programming idiom and better performance than most programmers could do on their own.

In the thread above, Abramov in no way claims that it’s not possible to create a faster application for thousands of components, just that the new renderer is much, much faster than the old one without changing the programming idiom at all. The programming idiom in React is very good, if not great. This is really good news.

Instead, you could say that Abramov’s claim is that anyone who claims to have made a faster renderer is making tradeoffs in other areas (e.g. from the list above). Most likely, the resulting balance is not as good as the clear, declarative syntax of React or it doesn’t cover nearly as many use cases.

Comparing Idioms (React vs. Svelte)

Is React’s syntax the best it can be? Maybe not yet. For example, a component declares mutable, internal state with the useState() hook, which returns a state variable and a “setter” function to change that state. Svelte, for example, improves on this by allowing the app to just declare the state variable and automatically noticing when that state is updated and generating the state-update code in the transpilation phase. This is an improvement that allows an app developer to work even closer to “normal” code than before.

If Svelte can provide this clearly more readable feature without introducing problems in other facets (e.g. learnability, performance, completeness), then it’s a clear win.

Case study: `async/await`)

A similar kind of improvement is async/await. This feature didn’t actually change how asynchronous code works. Instead, it allowed a programmer to write synchronous code that could be made asynchronous automatically.

This is a sea change for most developers—even those clever and experienced enough to have written that level of asynchronous code themselves. The point is that the developer is no longer wasting time writing what amounts to boilerplate code that is very error-prone and difficult to thoroughly test (which means that it’s often not thoroughly tested).

The idiom of async/await imposed minimal “noise” (none, actually) and has a tremendous upside. The code doesn’t necessarily get faster, but it could be made faster without changing it.

Inherent Limitations of Languages/Runtimes

The comment on Fiber Principles: Contributing To Fiber by sebmarkbage (React/Github) is another well-written contribution to this discussion that shows that there are a lot of clever people working on React that are aware of the fine balance between the requirements involved in writing a strong framework.

The user responds to accusations that much of this work would not be necessary if JavaScript had proper threading. The author argues that globally mutable prototypes are an intrinsic concept that is used in many, many JavaScript use cases. However, they also limit the ability of ever bringing threads to JavaScript. The language is limited from the get-go.

That doesn’t mean we should all stop using JavaScript. It just means that this is something that goes in the cons list and must be weighed against all of the pros. Anything that is in the cons list must be compensated with effort. JavaScript has many pros going for it: for example, it’s won the client-side programming-language war.

Perhaps WebAssembly will replace it as a runtime, but only time will tell. By then, it won’t matter, because we’ll be using languages like Elm or TypeScript to write our code. Even this doesn’t matter, though, because these languages must also transpile to the underlying paradigm defined by an engine that must run JavaScript.

That goes—for now—for WebAssembly targets as well. And threading is out for any of this stuff. Until something in this situation changes and we can target a threaded execution engine on the client side, we should be happy that there are very clever people making cooperative multi-tasking transparent and easy to program for the rest of us.

Those of us who worked on Apple OSs before OS X or Windows before 95 know what it’s like to have to deal with cooperative multi-tasking in our own code. I welcome the declarative paradigm that allows excellent performance for a wide range of use cases without making me write and maintain a whole bunch of code that has nothing to do with my application domain.

Avoiding shared, mutable state

There’s a reason why everyone with sense is talking about this concept. Using shared, mutable state makes it very easy to write the happy path of a single use case, but it makes it very hard to reason about other use cases and branches. It doesn’t scale, extend, test or maintain well. If these requirements don’t apply to your application—e.g. a script or one-off throwaway prototype—then you might be fine.

I would personally advise against practicing or becoming accustomed to techniques that apply to one use-case but that are dangerous in all other situations. You’ll generally end up using the technique to which you’ve become accustomed. While training yourself to build high-quality solutions risks the danger of over-engineering solutions to problems that could have been solved more simply, it’s easier to “downscale” your coding style than to “upscale” it.

With enough practice and the right techniques, you can write quality code just as initially efficiently as crappy code. I would also say to beware of the seductiveness of bad programming models that promise an initial speed in development that quickly drops off once it’s too late to change.

Prototypes happen to be built into the language in JavaScript’s case, but shared mutable data is the great stumbling block of concurrent programming. Applications that batch work into parallelizable chunks can be optimized to run more quickly by a clever runtime.

It is much simpler to reason about an application without shared mutable data. There are fewer cases and branches. Otherwise, an application must use locks (or fences or some sort of synchronization concept). The point is that efficient synchronization is not easy and many laic implementations tend toward speed rather than robustness and are buggy as a result.

Though it’s possible to hand-code faster concurrency than standard frameworks, most people can’t do it. And, given time, framework implementations get really, really good at optimizing nearly all cases. C# and .Net, for example, have a tremendously clever runtime underlying async/await now that can hardly be beaten for throughput, scheduling, etc. Successive versions have built on new language concepts introduced precisely to allow an application—where needed—to be more declarative in ways that allow even more optimization (e.g. record references, etc.)

It’s nice to see that Concurrent React—much like async/await in JS—provides a simple idiom for moving that effort out of the hands of most developers.

Reinventing the Wheel

Naturally, a developer is free to do that work on their own—and many commentators in the original thread at the start of this article seem to enjoy writing code that has nothing to do with their actual app just to show that they can. But with enhancements like async/await or Concurrent React, they don’t have to in order to enjoy performance benefits. That’s a win-win—a free lunch.

The point made above by Onion2k is very salient: very often “developers [aren’t] given the resources to” make the kind of optimizations that React will provide for free. Could a given rockstar developer write something even faster for exactly their application domain? Probably. Are they going to be given the time and budget to do so? Almost certainly not. It’s far better to have a good default that is smooth as silk and more than adequate to the task for almost all conceivable applications.

No-one’s paying you to reinvent the wheel. That’s almost certainly not your job. If you’d like it to be your job, then maybe you should work on a project where you’re inventing the wheel directly (i.e. a framework project). Then, you can build on that experience and your framework to turn around tightly written, maintainable and performant applications for your paying customers.

Inventing the Wheel

It’s important to be pragmatic and remember when you’re working on framework code and when you’re working on code that benefits from framework code without reinventing it. Otherwise, you’ve got a terrible situation: you invest in framework/infrastructure on every single project because you never reap the benefits of having written a framework. In the case of frameworks that are completely external to your application, like React (or Quino), you never even had to invest in writing the framework at all.

If you write a framework for just expert developers, there will be no adoption and you don’t help a large part of the community to write better apps. But what do we mean by better?

Continuing with React as an example, the abstract requirements at the start of this article roughly map to:

Maintainable (minimum of idioms)
Robust (not buggy)
Scaling performance
Consistent UX for
- Error conditions
- Networking latency (progressive handling, from avoiding needless jank to notifying the user that something is happening when it takes longer)
- Small/large data/updates
Tooling support for introspection, code-completion, debugging, discovery, etc.
Support for common tasks
- State management, from function-local to multi-function/context
- Describing state (mutable, non-mutable)

Minimizing Impedance

An application should have to only declare things about itself that are relevant to itself—but that also help to render the application better. Again, these idioms should scale: an application which will not have foreseeable performance issues in most components should be able to write those components with more approachable code.

Individual “islands” of code can provide additional information to optimize hotspots (like memoization, immutability hints, etc.) It’s important to note that these concepts are not introduced by the framework—they are intrinsic to the application’s domain model, but usually kept implicit.

If the application does not describe these aspects of itself, then the framework must make more pessimistic assumptions. Often this doesn’t matter. Where it does matter, the application should be able to use compatible and familiar idioms to improve the granularity of its description about itself. This, in turn, lets the framework use a faster approach where it now knows that it won’t violate the application’s definition.

The simplest of these is to tell React which parts of the state are mutable and which are immutable. When determining what has changed in an application state, a framework can simply compare the reference to the root node of an immutable object graph to the previous root-node reference to determine if that part of the graph has changed. If the object graph does not declare itself as immutable, then the framework must be pessimistic and compare the entire subgraph to determine if it has changed.

This is a concept that is intrinsic to programming. It is hard to conceive of it ever not being relevant. Naturally, if there is more than enough processing power available or the graph is small enough, it won’t matter, but it’s still axiomatically more work to compare potentially mutable graphs than immutable graphs. If an application fails to express immutability where it could have, that small missing bit of information reduces flexibility in choosing an algorithm.

This is not a new thing: most functional languages have immutability baked in as the default. Even C has the notion of const and volatile to give hints to the compiler about how it can deal with that data. Naturally, higher-level languages try to abstract away these concepts, but it constrains all the layers below.

Other Intrinsic Concepts

On this subject, another unavoidable concept is nullability: is a reference assigned or not? Most new languages (and newer versions of languages, like C#) are switching from the age-old—and convenient-for-the-compiler—default of nullable references. Again, reference assignment is a core concept that is unavoidable when thinking about code with pointers.

Another concept that limits choosing a more performant transformation during compilation is failing to express function purity. Does a function cause a side-effect? A compiler can optimize a function known to be pure in ways that it cannot with impure functions.

All of these features are a balance between programmer convenience, onboarding of new developers, and allowing programmers to focus on application logic rather than making concessions to the language and framework. As discussed above, though, there are concepts intrinsic to programming that have ostensibly nothing to do with application logic, but that an application declares (if not explicitly, then implicitly).

Taking the example from above, if an application declares that a person is in a company, but fails to mention that a person must be in a company, then the underlying software (framework and compiler) must be more pessimistic about that relationship than is strictly necessary.

A good framework encourages software to be precise about its own model by allowing the application to declare the salient parts of its model in a declarative minimal set of idioms.

Thoughts on Atomic/Utility CSS

2019-11-24T20:55:24+01:00

Published by root on 24. Nov 2019 20:55:24 (GMT-5)

The article In Defense of Utility-First CSS by Sarah Dayan on January 15th, 2018 (Frontstuff) is very long [1], so I’ve summarized a bit with notes and thoughts. [2]

I don’t really care about being pedantic without first knowing some facts. What are the requirements?

Requirements

Can I quickly make a precise change to a single component?
Can I make a global, thematic change?
How quickly can I make changes?
How maintainable is the result?

If atomic/utility CSS can deliver these things, then it’s probably a fine tool. But—spoiler alert—it seems more like a tool for designers—not programmers. Programmers have other, better tools for building CSS in a way that fulfills the requirements above.

Designers vs. Programmers

Essentially, these designers are like we programmers used to be: we used to care about cascading when we were still hand-coding our CSS. Now that we’re using LESS or another generator, we can use variables and functions for theming and use local CSS for precision. We can lean on specificity when it suits us and avoid it when it only gets in the way.

We want to declaratively say how we want everything to look and let our tools (LESS, WebPack with plugins) figure out how best to generate the CSS to accommodate supported browsers and also to create the kind of CSS that performs well without blowing up memory client-side. None of these optimizations and accommodations for targets should be up to the programmer/designer/CSS-writer at this point.

Utility vs. BEM

Utility-CSS feels functional, but it also feels like something you use when you don’t have LESS. I’ve never used BEM and agree that it never really made sense, from several good coding practices like DRY. That the author is coming from BEM to utility-CSS is not a surprise: BEM was never a good idea.

A Refactoring Use Case

“Early refactors are a pretty good indicator of unmaintainability.”

I don’t agree. It’s more a sign of shifting priorities or requirements. It’s not uncommon in agile development. The example the author has of changing the meaning of a “card” after there are already components using that style just means that you should make a “card2” class (not a “card-no-ribbon” one) because it’s just a different card type.

The problem is that the design now includes two cards, not that your implementation should somehow be able to easily roll with a confusing design.

Where I see a problem is when a card is supposed to have a certain padding and a border with a certain color (let’s say the “padding-top-8” and “border-bottom-lemon” from the author’s example). But then you don’t want those anymore.

Granted, with proper components, you’ll only have to change the style in one place anyway, right? So it doesn’t matter what you call it. You could have just called it “card” in the local styles and been done with it. So, either you have to remove those highly specific styles in many places in your HTML (as with an old-style web site, like earthli) or you change it in one place anyway (new-fangled, with React components).

Visuals vs. Semantics

I guess it’s the difference between knowing from the HTML what the component is going to look like (

) and knowing what the component is (
).

The author writes:

“Yet, the bigger and the more complex a component gets, the less obvious it is to know what class name maps to what element on the screen, or what it looks like.”
But then they include an example where it’s absolutely clear which components do what:
  The Shining
  
    
      His breath stopped in a gasp…
    
    
      
      Stephen King
      Stephen Edwin King …
      
        Website
        Twitter
      
    
  
I think this again shows the difference between programmers and designers: the code above is crystal clear to a programmer, so if a programmer is writing the CSS, then there’s no need to change anything.

The author seems to be a designer hell-bent on knowing exactly what the page will look like without actually showing it in a browser. I wish they’d included the version with utility CSS … it would have been a giant block of unreadable code, doubled in size with class names.

Don’t Change Anything
The author makes a good case for theming using CSS variables, which can be applied “at runtime” in the browser. The solution to theming with utility CSS turns out to be … making semantic styles instead of precisely named styles. So…not utility CSS.

The author references a few other articles, one of which is Kiss My Classname by Jeffrey Zeldman, which eloquently argues that there is nothing to change. He instead argues that developers and designers should use a visual style guide.

“I don’t believe the problem is the principle of semantic markup or the cascade in CSS. I believe the problem is a dozen people working on something without talking to each other.

“Slapping a visually named class on every item in your markup may indeed make your HTML easier to understand for a future developer who takes over without talking to you, especially if you don’t document your work and create a style guide. But making things easier for yourself and other developers is not your job. And if you want to make things easier for yourself and other developers, talk to them, and create a style guide or pattern library.”

“The present is always compromised, always rushed. We muddle through with half the information we need, praised for our speed and faulted when we stop to contemplate or even breathe. (Emphasis added.)”

[1] It’s also almost two years old, but still seems to describe the state-of-the-nation in utility/atomic CSS.
[2]
Another article they referenced was CSS Utility Classes and “Separation of Concerns” by Adam Wathan on August 7th, 2017 and it’s even longer. It’s almost a jeremiad with the seeming intent of breaking the reader down with a flood of words. I could only skim it, but it seems like these people are styling without programming: that is, some of the utility classes and even the slightly semantic ones they use could very easily be written more cleanly if they just used component-local styles.

For example, this is completely unnecessary with local styles, because you don’t have to worry about specificity biting you in the ass:
  
  
    Stubbing …
    
      In this quick blog post and screencast, …
    
  
In another article On the Growing Popularity of Atomic CSS by Ollie Williams on November 24th, 2017, the author mentions that they’re addressing “n a mixed-ability team, perhaps involving backend developers with limited interest and knowledge of CSS”. I didn’t have the energy to finish that one either, because a skim indicated that it repeated a lot of what was in the article I did read.

Azure Linked Accounts and SSH Keys

2019-10-17T14:42:13+02:00

Published by marco on 17. Oct 2019 14:42:13 (GMT-5)

Azure DevOps allows you to link multiple accounts.

Our concrete use case was:

User U1 was registered with an Azure DevOps organization O1
Microsoft did some internal management and gave our partner account a new organization O2, complete with new accounts for all users. Now I have user U2 as well, registered with O2.
U2 was unable to take tests to qualify for partner benefits, so I had to use U1 but link the accounts so that those test results accrued to O2 as well as O1.
We want to start phasing out our users from O1, so we wanted to remove U1 from O1 and add U2

Are we clear so far? U1 and U2 are linked because reasons. U1 is old and busted; U2 is the new hotness.

The linking has an unexpected side-effect when managing SSH keys. If you have an SSH key registered with one of the linked accounts, you cannot register an SSH key with the same signature with any of the other accounts.

This is somewhat understandable (I guess), but while the error message indicates that you have a duplicate, it doesn’t tell you that the duplicate is in another account. When you check the account that you’re using and see no other SSH keys registered, it’s more than a little confusing.

Not only that, but if the user to which you’ve added the SSH key has been removed from the organization, it isn’t at all obvious how you’re supposed to access your SSH key settings for an account that no longer has access to Azure DevOps (in order to remove the SSH key).

Instead, you’re left with an orphan account that’s sitting on an SSH key that you’d like to use with a different account.

So, you could create a new SSH key _or_ you could do the following:

Re-add U1 to O1
Remove SSH key SSH1 from U1
Register SSH key SSH1 with U2
Profit

If you can’t add U1 to O1 anymore, then you’ll just have to generate and use a new SSH1 key for Azure. It’s not an earth-shatteringly bad user experience, but interesting to see how several logical UX decisions led to a place where a couple of IT guys were confused for long minutes.

Using Git efficiently: SmartGit + BeyondCompare

2019-10-17T13:27:26+02:00

Published by marco on 17. Oct 2019 13:27:26 (GMT-5)

Updated by marco on 11. Mar 2021 14:33:13 (GMT-5)

I’ve written about using SmartGit (SG) before [1] [2] and I still strongly recommend that developers who manage projects use a UI for Git.

If you’re just developing a single issue at a time and can branch, commit changes and make pull requests with your IDE tools, then more power to you. For this kind of limited workflow, you can get away with a limited tool-set without too big of a safety or efficiency penalty.

However, if you need an overview or need to more management, then you’re going to sacrifice efficiency and possibly correctness if you use only the command line or IDE tools.

I tend to manage Git repositories, which means I’m in charge of pruning merged or obsolete branches and making sure that everything is merged. A well-rendered log view and overview of branches is indispensable for this kind of work.

SmartGit

I have been and continue to be a proponent of SmartGit for all Git-related work. It not only has a powerful and intuitive UI, it also supports pull requests, including code comments that integrate with BitBucket, GitLab and GitHub, among others.

It has a wonderful log view that I now regularly use as my standard view. It’s fast and accurate (I almost never have to refresh explicitly to see changes) and I have a quick overview of the workspace, the index and recent commits. I can search for files and easily get individual logs and blame.

The file-differ has gotten a lot better and has almost achieved parity with my favorite diffing/merging tool Beyond Compare. Almost, but not quite. The difference is still significant enough to justify Beyond Compare’s purchase price of $60.

What’s better in Beyond Compare [3]?

Diffing

While both differs have syntax-highlighting (and the supported file-types seem to be about the same), Beyond Compare distinguishes between significant and insignificant (e.g. comments or whitespace) changes. It makes it much easier to see whether code or documentation has changed.
The intra-line diffing in Beyond Compare is more fine-grained and tends to highlight changes better. SmartGit is catching up in this regard.
You can re-align a diff manually using F7. This is helpful if you moved code and want to compare two chunks that the standard diff no longer sees as being comparable

Merging

I could live without the Beyond Compare differ, but not without the merger.

TextMerge in BeyondCompare Pro

The 4-pane view shows left, base and right above as well as the target below, with the target window being editable. Each change has its own color, so you can see afterwards whether you took left, right or made manual changes.
The merge view includes a line-by-line differ that shows left, base, right and target lines directly above one another, with a scrollbar for longer lines.
The target view is color-coded to show the origin of each line of text: right, left, base or custom edited.
BeyondCompare makes a smart recommendation for how to merge a given conflict that is very often exactly what you want, which means that for many conflicts, you can just confirm the recommendation.
SmartGit has two separate windows for base vs. left/right and right/left vs. target. Long lines are really hard to decipher/merge in SmartGit

Integrate Beyond Compare into SmartGit

To set up SmartGit to use Beyond Compare

Select Tools > Diff Tools
1. Click the “Add…” button
2. Set File Pattern to *
3. Select “External diff tool”
4. Set the command to C:\Program Files (x86)\Beyond Compare 4\BCompare.exe
5. Set the arguments to “${leftFile}” “${rightFile}”
Select Tools > Conflict Solvers
1. Select “External Conflict Solver”
2. Set File Pattern to *
3. Set the command to C:\Program Files (x86)\Beyond Compare 4\BCompare.exe
4. Set the arguments to “${leftFile}” “${rightFile}” “${baseFile}” “${mergedFile}”

Update March 11th, 2021

I was testing the Git support in Visual Studio Code and ran into a somewhat surprising limitation. For those that use IDE Git integration without an external tool, this would be a pretty disappointing message. What do you do then?

VS Code Git overwhelmed

[1] In Git: Managing local commits and branches in December 2016 and Programming in the ~~modern~~current age in February 2013

[2] I am in no way affiliated with SmartGit.

[3] I am in no way affiliated with BeyondCompare.

Visual Studio 2019 Survey

2019-10-17T07:38:00+02:00

Published by marco on 17. Oct 2019 07:38:00 (GMT-5)

Visual Studio 2019 (VS) asked me this morning if I was interested in taking a survey to convey my level of satisfaction with the IDE.

VS displays the survey in an embedded window using IE11. [1] I captured the screen of the first thing I saw when I agreed to take the survey.

Visual Studio 2019 Survey error

I know it’s the SurveyMonkey script that’s failing, but it’s still not an auspicious start.

[1] I’d just upgraded to Windows 10 build 1903, which includes IE 11.418.18362.0. I can’t imagine that they didn’t test this combination.

Svelte vs. React (overselling a framework as a revolution)

2019-05-19T17:15:28+02:00

Published by marco on 19. May 2019 17:15:28 (GMT-5)

Updated by marco on 13. Jan 2022 09:53:18 (GMT-5)

I’ve just read about a web framework called Svelte in the post Virtual DOM is pure overhead. I think the product itself sounds interesting, but that the author uses unnecessarily misleading arguments.

From what I gather, Svelte is a compile-time reconciliation generator for JSX/TSX components. This pre-calculated generator applies changes to the DOM without needing a virtual DOM and without real-time diffing or reconciliation. That is, instead of having real-time calculation, with possible performance hits [1], the app benefits from having all possible state changes pre-calculated and ready to apply immediately and quickly.

This all sounds pretty good, I think. I’m definitely going to take a look at the more-advanced tutorials. [2]

However, the author wasn’t happy with just presenting his product, but seems to need to mischaracterize why products like React abstracted away from the DOM in the first place. He tells us that the virtual DOM was always slower than manipulating the DOM. But that isn’t the claim React makes. React helps users avoid common performance pitfalls in the model of programming that it replaced—it never claimed to be the final word in performance optimization.

It’s clear that something like Svelte—if it can cover all the needs of an app—is faster than maintaining a virtual DOM.

But that product isn’t what React replaced. React replaced products written in jQuery. React brought an asynchronous frame-based renderer to the web (something that products like WPF have had for decades). It brought us type-safe views (when used with TypeScript) and taught us about the advantages of immutable data structures.

He stands on their shoul-ders, then implies that they were idiots for not having been taller.

The author characterizes the notion that a virtual DOM is faster as a “meme”. This is silly and imprecise. It is true that React will be more efficient than most hand-coded web sites of a typical level of complexity. jQuery sites tended to teeter and collapse under their own weight. They were unmaintainable and very difficult to optimize without nearly rewriting them. React sites, on the other hand, are modular in nature and the library includes several standard patterns to apply and measures to take to optimize these components. It’s not always easy, but it’s better than it was in the old days.

And there are solutions in React to performance issues. The users must follow patterns and use the APIs correctly. That’s the way it is in every framework or library. Some libraries offer less leeway for users to screw up performance in the way that they shape their APIs.

Sometimes the API surface goes too much in that direction and ends up handcuffing users. That is, users can’t write what they want to write in a way that feels natural because the pattern they prefer wouldn’t perform well under their framework. Instead, the user must change how they think about writing apps just to use the framework. This isn’t necessarily a bad thing, but is definitely something to consider. It’s possible that Svelte offers all of the advantages of React with even more flexibility and less opinion.

React—and its companion Redux—was always about being very declarative about state and changes. There is no magic, even the reconciliation algorithm is very predictable. There are other approaches, like MobX, which users claim “does the right thing” with state changes, even if the user fails to declare dependencies as clearly as React would have required. [3] I imagine that Svelte is going in this direction as well.

The claim I think that Svelte is making is that users can write code that feels more natural without changing their paradigm to match the framework. That is, Svelte must have some rules for which state the compiler observes and pre-compiles, but the claim is that it’s much more flexible and forgiving than React’s “straitjacket” (my word). [4]

He goes on to say that React acknowledges its own slowness by giving the user control over shouldComponentUpdate. This is a silly argument again. It’s arguing that React bamboozled people in 2013 by convincing them to use their framework instead of a library that the author purports is faster but that he only started in 2017.

There is honestly no need for this kind of bullshit. If your library offers advantages over React, describe them and let them speak for themselves. There is no need to rewrite the whole history of a product that quite clearly inspired your own, pretending that the authors of your own framework’s inspiration are your inferiors because they failed to leap directly to the concepts outlined in your library. He stands on their shoulders, then implies that they were idiots for not having been taller.

Through all of this fluff, it took to about ¾ of the way through the article to find out that Svelte generates update code at build time. I would have been much more intrigued had the author led with that. Now, I’m going to be suspicious of everything about this framework because the author went to such lengths to bamboozle and oversell me. He seems to want me to think I’ve been a fool for having used React in the first place, when his framework has been waiting for me all along, since all the way back to sometime in 2018. [5]

But he waits until the very last paragraph to explain what Svelte actually is—even though he’s been comparing it to React the entire time. It’s a good description:

“It’s important to understand that virtual DOM isn’t a feature. It’s a means to an end, the end being declarative, state-driven UI development. Virtual DOM is valuable because it allows you to build apps without thinking about state transitions, with performance that is generally good enough. That means less buggy code, and more time spent on creative tasks instead of tedious ones.

“But it turns out that we can achieve a similar programming model without using virtual DOM — and that’s where Svelte comes in.”

This is a much fairer characterization of the two libraries: they both base on a very similar model—one that React did a tremendous amount of legwork in establishing as an attractive approach in people’s minds—but that Svelte goes a step further to improve the reconciliation mechanism, moving it from runtime to compile-time. Svelte’s improvement could be a highly welcome one, but it’s incremental, not revolutionary.

That’s wonderful! But it’s actually even more wonderful than his article indicated, because I actually don’t have to learn anything to work with Svelte instead of React. I can work pretty much the same (Svelte doesn’t have hooks [6] because it seems it doesn’t need them) and just kind of “drop in” Svelte instead of React and have better performance, even in places where I’d never noticed I might have had problems.

That is, with Svelte instead of React, my app will be overall faster because performance no longer suffers from “death by a thousand cuts”, as the author puts it. Despite the author’s overzealous mischaracterizations and attempts at hot-take marketing, I’m still going to check out Svelte.

[1] Such performance hits are often irrelevant, as even the author admits. However, if you can choose between two frameworks, one that punishes you with bad performance and another that optimizes instead, you would choose the more forgiving one, if there are no other downsides.

[2] Here I’m going to lead with my conclusion so as not to follow in the author’s footsteps, using a style that I spend the rest of this article upbraiding.

[3]

I’m not sure what MobX 5 is up to or what introspection it offers into the web of observables and dependencies in a more-complex application, but older versions of the library were not easy to debug when performance problems arose. From what I’ve read from users, things have gotten much better, but I’m still inclined to think that React’s declarative approach suits me better—it’s easier for me to apply well-established patterns in my own code rather than trying to figure out how to appease the MobX black box. Again, things may be different now than in earlier versions. I’m open to taking another look at MobX.

I’m also not sure how Svelte and MobX compare: MobX requires users to indicate that state is “observable” before it manages it, whereas I assume Svelte determines for itself which state-transitions it should track.

[4]

Update January 2022: In going through the tutorial available today, you’re very quickly introduced to reactive declarations to help Svelte determine which compound expressions should be “watched” for changes to sub-elements. That is, if you declare simple variable, any references to it in view code will be automatically updated, but if you derive another simple value from it and observe that value, it only updates when the derived value is updated directly. This is unlikely, as the derives value presumably implements an algorithm of some sort and should never be directly changed (i.e. it’s a calculated property in the parlance of other frameworks. For example, given the following code,

let count = 0;
doubled = count * 2;

Any observers (i.e. embeddings in a view) of the value of doubled will not be updated when value changes, even though the naive interpretation of a JavaScript developer would be that of course it changes. In order to get the desired effect, you must make it a reactive declaration with $:. For example,

let count = 0;
doubled = count * 2;

This is perfectly fine, but is an example of how the “you don’t have to do anything to make your JavaScript work naturally, unlike smelly React”, is overselling the advantage. Missing reactive declarations will cause an app to not work as expected just as must as a missing useState() does in React.

[5] It’s hard to tell when it was released because there are 254 releases, one nearly every day and I didn’t feel like hitting “older” in GitHub for 30 pages or cloning the repository just to get a better look at the log.

[6]

The author disparages hooks, saying that they are even worse for performance and linking to a tweet with the words “with predictable results”. The tweet complains about atrocious performance because of constant reconciliation and rendering—but a dozen answers down is the answer: the original poster failed to tell the useEffect() hook on which state it relied.

That’s kind of a rookie mistake—in that framework. [7] I understand that Svelte claims that it doesn’t need these hints in order to be able to determine at compile-time when a piece of code needs to be executed because the state on which it relies has changed.

React is declarative and requires help from the user whereas presumably the selling point of Svelte is that this user would have wasted less time improving performance and more time focused on application logic because Svelte is smart enough to do all this for you.

I personally think that this sounds awesome and that it is an admirable goal, but have my doubts that Svelte doesn’t also impose its own set of limitations on what kind of state transformations you can do that the compiler can actually detect.

That is, React provides an API with which callers can “help” the reconciliation algorithm avoid work. Svelte claims that this isn’t necessary, but I’m going to guess that there are rules for how sophisticated state changes can be before the Svelte compiler no longer detects them. In that case, what does Svelte do? Fall back to using a React-like virtual DOM to reconcile changes? Or just not update when the user expects? Or just fail to compile, spitting out an error indicating what the user should do to fix them issue (my personal favorite)?

[7] Update January 2022: It’s also a mistake that IDEs will help you avoid. WebStorm issues a warning if you rely on state in an effect that you have not listed in the dependencies.

Why use an IOC? (hint: testing)

2019-04-08T09:38:17+02:00

Published by marco on 8. Apr 2019 09:38:17 (GMT-5)

Updated by marco on 12. Jan 2023 16:49:41 (GMT-5)

[[_TOC_]]

## Introduction

Testing is any form of validation that verifies code. That includes not only structured validation using checklists, test plans, etc. but also informal testing, as when developers click their way through a UI or emit values in debugging output to a console.

_Automated testing_ covers the topic of all regression-style tests that execute both locally and in CI. This includes unit, integration, and end-to-end tests.

Testing is primarily a mindset.

You should think of writing tests not as something you _have_ to do, but rather as something you _want_ to do.

- How else do you prove that what you wrote works?
- What does _”it works”_ mean?
- Which _use cases_ are covered?
- How do you answer these questions without tests?
- What do we mean by _writing_ tests?

## You’re already testing!

You’re almost certainly already testing.

You might be clicking through the UI or emitting statements in a command-line application, but you’re verifying your code _somehow_. I mean … you are, right? RIGHT?

I’m kidding. Of course you’re not just writing code, building it, and committing it. You’re validating it somehow.

That’s testing.

### A list of validations

If you’re really good, you might even keep a list of these validations. Once you have a list, then,

1. You don’t have to worry about forgetting to do them in the future
1. Even someone with no knowledge of the system can perform validation

This is fine, but it’s still a manual process. A manual process carries with it the following drawbacks:

1. It gets quite time-consuming, especially as the list of validations grows
1. You’re highly unlikely to perform the validations often enough
− It’s much easier to fix a mistake if you learn about it relatively soon after you made it
1. You’re also unlikely to add _all_ of the validations you need
− Generally, you won’t validate smaller “facts” and will focus on high-level stuff
1. A manual validation process can’t be run as part of CI or CD

### Automating the list

Automated testing means that you _codify_ those validations.

> 😒 Great! I have tests! How the heck do I _codify_ them?

Don’t panic. Almost any code can be tested. In fact, if you can’t get at it with a test, then you might have found an architectural problem.

See? Automating tests will even help you write better code!

> 🤨 How do I get started?

Just start somewhere. It doesn’t matter where. Don’t worry about coverage. Just get the feeling for writing a proof about a facet of your code. Any bit of logic can—and should—be tested.

What if you still don’t know where to begin? Ask someone for help! Don’t be shy. It’s in everyone’s best interest for a project to have good tests. You want everyone’s code to have tests so you know _right away_ when you’ve broken something in a completely unrelated area. This is a good thing!

## Goals

> 🤸‍♀️ Developers should be excited to use tests to prove that their code works.

### Tests should be quick and easy (maybe even fun) to write

A project should provide support for mocking devices and external APIs, or for using test-specific datasets.

### Tests should be reasonably fast

A reasonably fast test suite will tend to be run more often. We would like a developer to notice a broken test right after the change that broke it, preferably even before pushing it.

### Avoid debugging tests in CI

Tests a developer runs locally should almost always work in CI. Failing tests in CI should also fail locally.

## Guidelines

> 🤨 Don’t be pedantic.

For example,

- [Stop requiring only one assertion per unit test: Multiple assertions are fine](https://stackoverflow.blog/2022/11/03/multiple-assertions-per-test-are-fine/)
- Don’t forbid mocking in integration tests and don’t force mocking in unit tests.
− In fact, stop worrying about whether it’s a unit or an integration and just _write useful tests_ that _prove useful things_ about your code.
- Don’t get obsessed with automating _everything_.
− Get the low-hanging fruit first, and leave the rest to manual testing.
− See where you stand.
− If you haven’t automated enough, iterate until done. 🔄

### Tests should be useful

- Tests should confirm use cases
- Tests should prove something about your code that you think is worth proving.
- Tests should confirm behavior that either is how the code _currently_ works or how it _should_ work.
- Tests should help you write better code from the get-go.
- Every bug that you need to fix is de-facto a use case that needs a test.

### Code Coverage & Reviews

How do you know when there are “enough” automated tests?

Don’t get distracted by trying to achieve a specific coverage percentage. The most important thing is that the major use cases are covered.

New code, though, should always have automated tests. A **code reviewer** should verify that new functionality is being tested.

## Types of tests

| Type | Definition | When to use them |
| − | − | − |
| Unit | Cover a single unit, mocking away other dependencies where needed | Useful for verifying simple logic like calculated properties or verifying the results of service methods with given inputs |
| Integration | Cover multiple units, possibly mocking unwanted dependencies| Useful for verifying behavior of units in composition, as they will be used in the end product. The goal is to cover as much as possible without resorting to more costly end-to-end tests |
| End-to-End | Also called _UI Tests_, these tests verify the entire stack for actual customer use cases | Very useful, but generally require more maintenance as they tend to be more fragile. Essential for verifying UI behavior not reflected in a programmatic model. Can work with snapshots (e.g. error label is in red) |

## Approach

The article [Write tests. Not too many. Mostly integration.](https://kentcdodds.com/blog/write-tests) describes a pragmatic approach quite well. Instead of the classic “testing pyramid”, it suggests a “testing trophy”.

![image.png](/.attachments/image-6b9cafdf-0bac-4155-bb8f-363a92822bc3.png =300x)

This style of development has the following aims:

1. Verify as much as possible _statically_, with linting and analyzers
1. Make _integration tests_ cheaper because they prove more about your system than _unit tests_
1. Prove as much as possible outside of _end-to-end tests_ because they’re expensive and brittle

## Analysis

> Remember that everything you use has to work both locally and in CI.

### Static-checking

A project should include analyzers and techniques so that the compiler helps make many tests unnecessary. For example, if you know that a parameter or result can never be `null`, then you can avoid a whole slew of tests.

Developers should only spend time writing tests that verify semantic aspects that can’t be proven by the compiler.
ac
#### Null-reference analysis in .NET

The .NET world provides many, many analyzers and tools to verify code quality. One of the most important things a project can do is to improve null-checking. The best way to do this is to upgrade to C# 8 or higher and enable [null-reference analysis](https://learn.microsoft.com/en-us/dotnet/csharp/nullable-references). The [default language for .NET Framework is going to stay C# 7.3](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version), but
you can [enable null-reference analysis for .NET Framework](https://www.infoq.com/articles/CSharp-8-Framework/) quite easily.

Another option is to use the [JetBrains Annotations NuGet package](https://www.nuget.org/packages/JetBrains.Annotations/), which provides attributes to indicate whether parameters or results are nullable.

The preferred way, though, is to use the by-now standard nullability-checking available in .NET.

Doing neither is not a good option, as it will be very difficult to avoid null-reference exceptions.

### Unit-testing

Unit tests are very useful for validating _requirements_ and _invariants_ about your code.

These are the easiest tests to write and will generally be the first ones that you will write.

A requirement or an invariant may be specified in the story itself, but it can be anything that you know about the code that’s important. It’s up to the developer and the reviewer(s) to determine which tests are necessary. It gets easier with experience—and it doesn’t take long to get enough experience so that it’s no longer so intimidating.

#### Unit-testing example

Just as a quick example in .NET, consider the following code,

```csharp
public bool IsDiagnosticModeRunning
{
get => _isDiagnosticModeRunning;
set
{
_isDiagnosticModeRunning = value;
_statusManager.InstrumentState = value ? InstrumentState.DiagnosticMode : InstrumentState.Ready;
}
}
```

Here we see a relatively simple property with a getter and a setter. However, we also see that there is an invariant in the implementation: that the `_statusManager.InstrumentState` is synced with it.

Using many of the [techniques described below](#tools-and-techniques), we could write the following test:

```csharp
[DataRow(true, InstrumentState.DiagnosticMode)]
[DataRow(false, InstrumentState.Ready)]
[TestMethod]
public void TestIsDiagnosticModeRunning(bool running, InstrumentState expectedInstrumentState)
{
var locator = CreateLocator();
var instrumentControlService = locator.GetInstance();
var statusManager = locator.GetInstance();

Assert.AreNotEqual(expectedInstrumentState, statusManager.InstrumentState);

instrumentControlService.IsDiagnosticModeRunning = running;

Assert.AreEqual(expectedInstrumentState, statusManager.InstrumentState);
}
```

Here, we’re using MSTest to create a parameterized test that,

- creates the IOC
- gets the two relevant services from it
- Verifies that the state is not already set to the expected state (in which case the test would succeed even if the tested code doesn’t do anything)
- Sets the property to a given value
- Verifies that the state is correct for that value

We now have code that validates two _facts_ about the system. Should something change where these facts are no longer true, the tests will fail, giving the developer a chance to analyze the situation.

- Was the change inadvertent or deliberate?
- Are the facts still correct? Does the test need to be updated?

If you’re addressing a bug-fix, though, you might be able to _prove_ that you’ve fixed the bug with a unit test, but it’s also likely that you’ll have to write an integration test instead.

### Integration-testing

The “theoretically” above means that the ability to write integration tests as efficiently as unit tests is contingent on a project offering proper tools and support.

To this end, then, a project should offer base and support classes that make common integration tests easy to set up and quick to execute:

- Interacting with a database
- Setting up a known database schema
- Getting to a clean dataset
- [Mocking]() the database
- Mocking other external dependencies in a project (e.g. loading configuration from an endpoint, sending emails, sending modifications to endpoints)

There are many different ways to solve this problem, each with tradeoffs. For example, a project can load dependencies in Docker containers, either created and started manually (see [Testing your ASP.NET Core application − using a real database](https://josef.codes/testing-your-asp-net-core-application-using-a-real-database/)) or even dynamically with a tool like the [Testcontainers NuGet package](https://github.com/testcontainers/testcontainers-dotnet).

### Comparing Unit and Integration tests

A drawback to unit tests is that, while they can test an individual component well, it’s really the big picture that we want to test. We want to test scenarios that correspond to actual use cases rather than covering theoretical call stacks. It’s not that the second part _isn’t_ important, but that it’s not _as_ important.

This, however, leads directly to…

If you find that you have integration tests failing and it takes a while to figure out what went wrong, then that’s a sign that you should bolster your test suite with more unit tests.

Once an integration test fails _and_ one or more unit tests fail, then you have the best of both worlds: you’ve been made aware that you’ve broken a use case (integration test), but you also know which precise behavior is no longer working as before (unit test).

## Tools and Techniques

### Tests are Code

Test code is just as important as product code. Use all of the same techniques to improve code quality in test code as you would in product code. Clean coding, good variable names, avoid copy/paste coding—all of it applies just as much to tests.

There are two main differences:

- You don’t need to document tests
- You don’t have to write tests for tests. :-)

### Writing testable code

This is a big, big topic, of course. There are a few guidelines that make it easier to write tests—or to avoid having to write tests at all.

As noted above, code that can be validated by the compiler (static analysis) doesn’t need tests. E.g. you don’t have to write a test for how your code behaves when passed a `null` parameter if you just _forbid it_. Likewise, you don’t have to re-verify that types work as they should in statically typed languages. We can trust the compiler.

Here are a handful of tips.

- Prefer composition to inheritance
- A functional programming style is very testable
- An IOC Container is very helpful
- Avoid nullable properties, results, and parameters
- Avoid mutable data
- Interfaces are much easier to fake or mock; use those wherever you can

See the following articles for more ideas.

- [C# Handbook − Chapter 4: Design](https://github.com/mvonballmo/CSharpHandbook/blob/master/4_design.md) (2017)
- [Questions to consider when designing APIs: Part I](https://www.earthli.com/news/view_article.php?id=2996) (2014)
- [Questions to consider when designing APIs: Part II](https://www.earthli.com/news/view_article.php?id=2997) (2014)
- [Why use an IOC? (hint: testing)](https://www.earthli.com/news/view_article.php?id=3487) (2019)

### Parameterized Tests

Investigate your testing library to learn how to write multiple tests without having to write a lot of code. In the MSTests framework, you can use `DataRow` to parameterize a test. In NUnit, `TestCase` does the same thing, and `Value` allows you to provide parameter values for a list of tests that are the Cartesian product of all values.

### Mocking/Faking

#### Focus on what you’re testing

#### Test error conditions

#### How to fake?

As noted above, it’s much, much easier to use fake objects if you’ve consistently used interfaces. You can just create your own implementation of the interface whose standard implementation you want to replace, give it a fake implementation (e.g. returning `false` and empty string and `null` for methods and properties), and then use that class as the implementation.

#### Faking/mocking libraries

For .NET, a great library for faking objects is [FakeItEasy](https://fakeiteasy.github.io/).

With a fake object, you can indicate which values to return for a given set of parameters without too much effort. Similarly, you can use the same API to query how often these methods have been called. This allows you to verify, for example, that a call to a REST service _would have been made_. This is a powerful way of proving facts about your code without having to actually interact with external services.

#### An example

The following code configures a fake object for `ITestUnitConfigurationService` that returns default data for all properties, except for `Configuration` and `GetTestUnitParameterValues()`, which are configured to return specific data.

```csharp
private static ITestUnitConfigurationService CreateFakeTestUnitConfigurationService()
{
var result = A.Fake();

var testUnitParameters = CreateTestUnitParameters();
var testUnitConfiguration = new TestUnitConfiguration(testUnitParameters);

A.CallTo(() => result.Configuration).Returns(testUnitConfiguration);

var testUnitParameterValues = CreateTestUnitParameterValues();

A.CallTo(() => result.GetTestUnitParameterValues()).Returns(testUnitParameterValues);

return result;
}
```

In the test, we could get this fake object back out of the IOC (for example) and then verify that certain methods have been called the expected number of times.

```chsarp
var testUnitConfigurationService = locator.GetInstance();

A.CallTo(() => testUnitConfigurationService.Configuration).MustHaveHappenedOnceExactly();
A.CallTo(() => testUnitConfigurationService.GetTestUnitParameterValues()).MustHaveHappenedOnceExactly();
```

### Snapshot-testing

You can avoid writing a ton of assertions and a ton of tests with snapshot testing.

For example, imagine you have a test that generates a particular view model. You want to verify 30 different parts of this complex model.

You _could_ navigate the data structure, asserting the 30 values individually.

That would be pretty tedious, though, and lead to fragile and hard-to-maintain testing code.

Instead, you could emit that structure as text and save it as a _snapshot_ in the repository. If a future code change leads to a different snapshot, the test fails and the developer that caused the failure would have to approve the new snapshot (if it’s an expected or innocuous change) or fix the code (if it was inadvertent and wrong).

- Avoid timestamps or data that changes over time
- Avoid using output methods that are too likely to change over time

See the documentation for the [Snapshooter NuGet package](https://swisslife-oss.github.io/snapshooter/).

Using Unity, Collab and Git

2019-01-22T19:47:58+01:00

Published by marco on 22. Jan 2019 19:47:58 (GMT-5)

If you’re familiar with the topic, you might be recoiling in horror. It would be unclear, though, whether you’re recoiling from the “using Collab” part or the “using Collab with Git” part.

Neither is as straightforward as I’d hoped.

tl;dr: If you have to use Collab with Unity, but want to back it up with Git, disable core.autocrlf [1] and add * -text to the .gitattributes.

Collab’s Drawbacks

Collab is the source-control system integrated into the Unity IDE.

It was built for designers to be able to do some version control, but not much more. Even with its limited scope, it’s a poor tool.

The functionality horror

The system does not ever show you any differences, neither in the web UI nor the local UI, neither for uncommitted nor committed files
Some changes cannot be reverted. No reason is given.
You can only delete new files from the file system.
There is no support for renaming
Reverting to a known commit has worked for me exactly once out of about 10 tries. The operation fails with an Error occurred and no further information. If you really get stuck, your only choice is to restore the entire workspace by re-cloning/re-downloading it.
Conflict resolution is poorly supported, although it works better than expected (it integrates with BeyondCompare, thank goodness).

The usability horror

The UI only lets you commit all changed files at once.
- There is no notion of “commits”.
- You can’t commit individual files or chunks.
- There is no staging area.
- You can’t exclude files.
- You can ignore them completely, but that doesn’t help.
The UI is only accessible via mouse from the menu bar.
You can sometimes revert folders (sometimes you can’t, again with an Error occurred message), but you can’t revert arbitrary groups of files.
The UI is almost entirely in that custom drop-down menu.
You can scroll through your changed files, but you can’t expand the menu to show more files at once.
You can show a commit history, but there are no diffs. None.
There aren’t even any diffs in the web version of the UI, which is marginally better, but read-only.

Pair Git with Collab

This is really dangerous, especially with Unity projects. There is so much in a Unity project without a proper “Undo” that you very often want to return to a known good version.

So what can we do to improve this situation? We would like to use Git instead of Collab.

However, we have to respect the capabilities and know-how of the designers on our team, who don’t know how to use Git.

On our current project, there’s no time to train everyone on Git—and they already know how to use Collab and don’t feel tremendously limited by it.

Remember, any source control is better than no source control. The designers are regularly backing up their work now. In its defense, Collab is definitely better than nothing (or using a file-share or some other weak form of code-sharing).

Instead, those of us who know Git are using Git alongside Collab.

It kind of works…

We started naively, with all of our default settings in Git. Our workflow was:

Pull in Unity/Collab
Fetch from Git/Rebase to head (we actually just use “pull with rebase”)

Unfortunately, we would often end up with a ton of files marked as changed in Collab. These were always line-ending differences. As mentioned above, Collab is not a good tool for reverting changes.

The project has time constraints—it’s a prototype for a conference, with a hard deadline—so, despite its limitations, we reverted in Collab and updated Git with the line-endings that Collab expected.

We limped along like this for a bit, but with two developers on Git/Collab on Windows and one designer on Collab on Mac, we were spending too much time “fixing up” files. The benefit of having Git was outweighed by the problems it caused with Collab.

Know Your Enemy

So we investigated what was really going on. The following screenshots show that Collab doesn’t seem to care about line-endings. They’re all over the map.

JSON file with mixed line-endings

CS file with CRLF line-endings

.unity file with LF line-endings

Configuring Git

Git, on the other hand, really cares about line-endings. By default, Git will transform the line-endings in files that it considers to be text files (this part is important later) to the line-ending of the local platform.

In the repository, all text files are LF-only. If you work on MacOS or Linux, line-endings in the workspace are unchanged; if you work on Windows, Git changes all of these line-endings to CRLF on checkout—and back to LF on commit.

Our first “fix” was to turn off the core.autocrlf option in the local Git repository.

git config –local core.autocrlf false

We thought this would fix everything since now Git was no longer transforming our line-endings on commit and checkout.

This turned out to be only part of the problem, though. As you can see above, the text files in the repository have an arbitrary mix of line-endings already. Even with the feature turned off, Git was still normalizing line-endings to LF on Windows.

The only thing we’d changed so far is to stop using the CRLF instead of LF. Any time we git reset, for example, the line-endings in our workspace would still end up being different than what was in Git or Collab.

Git: Stop doing stuff

What we really want is for Git to stop changing any line-endings at all.

This isn’t part of the command-line configuration, though. Instead, you have to set up .gitattributes. Git has default settings that determine which files it treats as which types. We wanted to adjust these default settings by telling Git that, in this repository, it should treat no files as text.

Once we knew this, it’s quite easy to configure. Simply add a .gitattributes file to the root of the repository, with the following contents:

* -text

This translates to “do not treat any file as text” (i.e. match all files; disable text-handling).

Conclusion

With these settings, the two developers were able to reset their workspaces and both Git and Collab were happy. Collab is still a sub-par tool, but we can now work with designers and still have Git to allow the developers to use a better workflow.

The designers using only Collab were completely unaffected by our changes.

[1] Technically, I don’t think you have to change the autocrlf setting. Turning off text-handling in Git should suffice. However, I haven’t tested with this feature left on and, due to time-constraints, am not going to risk it.

Finding deep assembly dependencies

2019-01-21T20:26:52+01:00

Published by marco on 21. Jan 2019 20:26:52 (GMT-5)

Quino contains a Sandbox in the main solution that lets us test a lot of the Quino subsystems in real-world conditions. The Sandbox has several application targets:

WPF
Winform
Remote Data Server
WebAPI Server
Console

The targets that connect directly to a database (e.g. WPF, Winform) were using the PostgreSql driver by default. I wanted to configure all Sandbox applications to be easily configurable to run with SqlServer.

Just add the driver, right?

This is pretty straightforward for a Quino application. The driver can be selected directly in the application (directly linking the corresponding assembly) or it can be configured externally.

Naturally, if the Sandbox loads the driver from configuration, some mechanism still has to make sure that the required data-driver assemblies are available.

The PostgreSql driver was in the output folder. This was expected, since that driver works. The SqlServer was not in the output folder. This was also expected, since that driver had never been used.

I checked the direct dependencies of the Sandbox Winform application, but it didn’t include the PostgreSql driver. That’s not really good, as I would like both SqlServer and PostgreSql to be configured in the same way. As it stood, though, I would be referencing SqlServer directly and PostgreSql would continue to show up by magic.

Before doing anything else, I was going to have to find out why PostgreSql was included in the output folder.

I needed to figure out assembly dependencies.

Visual Studio?

My natural inclination was to reach for NDepend, but I thought maybe I’d see what the other tools have to offer first.

Does Visual Studio include anything that might help? The “Project Dependencies” shows only assemblies on which a project is dependent. I wanted to find assemblies that were dependent on PostgreSql. I have the Enterprise version of Visual Studio and I seem to recall an “Architecture” menu, but I discovered that these tools are no longer installed by default.

According to the VS support team in that link, you have to install the “Visual Studio extension development” workload in the Visual Studio installer. In this package, the “Architecture and analysis tools” feature is available, but not included by default.

Hovering this feature shows a tooltip indicating that it contains “Code Map, Live Dependency Validation and Code Clone detection”. The “Live Dependency Validation” sounds like it might do what I want, but it also sounds quite heavyweight and somewhat intrusive, as described in this blog from the end of 2016 (MSDN). Instead of further modifying my VS installation (and possibly slowing it down), I decided to try another tool.

ReSharper?

What about ReSharper? For a while now, it’s included project-dependency graphs and hierarchies. Try as I might, I couldn’t get the tools to show me the transitive dependency on PostgreSql that Sandbox Winform was pulling in from somewhere. The hierarchy view is live and quick, but it doesn’t show all transitive usages.

ReSharper missing dependencies

The graph view is nicely rendered, but shows dependencies by default instead of dependencies and usages. At any rate, the Sandbox wasn’t showing up as a transitive user of PostgreSql.

ReSharper missing dependencies Graph

I didn’t believe ReSharper at this point because something was causing the data driver to be copied to the output folder.

NDepend to the rescue

So, as expected, I turned to NDepend. I took a few seconds to run an analysis and then right-clicked the PostgreSql data-driver project to select NDepend => Select Assemblies… => That are Using Me (Directly or Indirectly) to show the following query and results.

Ndepend Dependency Query and List

Bingo. Sandbox.Model is indirectly referencing the PostgreSql data driver, via a transitive-dependency chain of 4 assemblies. Can I see which assemblies they are? Of course I can: this kind of information is best shown on a graph, so you can show a graph of any query results by clicking Export to Graph to show the graph below.

Ndepend Dependency Graph

Now I can finally see that the SandboxModel pulls in the Quino.Testing.Models.Generated (to use the BaseTypes module) which, in turn, has a reference to Quino.Tests.Base which, of course, includes the PostgreSql driver because that’s the default testing driver for Quino tests.

Now that I know how the reference is coming in, I can fix the problem. Here I’m on my own: I have to solve this problem without NDepend. But at least NDepend was able to show me exactly what I have to fix (unlike VS or ReSharper).

I ended up moving the test-fixture base classes from Quino.Testing.Models.Generated into a new assembly called Quino.Testing.Models.Fixtures. The latter assembly still depends on Quino.Tests.Base and thus the PostgreSql data driver, but it’s now possible to reference the Quino testing models without transitively referencing the PostgreSql data driver.

A quick re-analysis with NDepend and I can see that the same query now shows a clean view: only testing code and testing assemblies reference the PostgreSql driver.

Ndepend Dependency Query and List (cleaned up)

Finishing up

And now to finish my original task! I ran the Winform Sandbox application with the PostgreSql driver configured and was greeted with an error message that the driver could not be loaded. I now had parity between PostgreSql and SqlServer.

The fix? Obviously, make sure that the drivers are available by referencing them directly from any Sandbox application that needs to connect to a database. This was the obvious solution from the beginning, but we had to quickly fix a problem with dependencies first. Why? Because we hate hacking. :-)

Two quick references added, one build and I was able to connect to both SQL Server and PostgreSql.

QQL: A Query Language for Quino

2019-01-20T22:37:35+01:00

Published by marco on 20. Jan 2019 22:37:35 (GMT-5)

Updated by marco on 21. Jan 2019 10:00:49 (GMT-5)

In late 2011 and early 2012, Encodo designed a querying language for Quino. Quino has an ORM that, combined with .NET Linq provides a powerful querying interface for developers. QQL is a DSL that brings this power to non-developers.

QQL never made it to implementation—only specification. In the meantime, the world moved on and we have common, generic querying APIs like OData. The time for QQL is past, but the specification is still an interesting artifact, in its own right.

Who knows? Maybe we’ll get around to implementing some of it, at some point.

At any rate, you can download the specification from Encodo or here at earthli.

The following excerpts should give you an idea of what you’re in for, should you download and read the 80-page document.

Details

The TOC lists the following top-level chapters:

Introduction
Examples
Context & Scopes
Standard Queries
Grouping Queries
Evaluation
Syntax
Data Types and Operators
Libraries
Best Practices
Implementation Details
Future Enhancements

From the abstract in the document:

“The Quino Query Language (QQL) defines a syntax and semantics for formulating data requests against hierarchical data structures. It is easy to read and learn both for those familiar with SQL and non-programmers with a certain capacity for abstract thinking (i.e. power users). Learning only a few basic rules is enough to allow a user to quickly determine which data will be returned by all but the more complex queries. As with any other language, more complex concepts result in more complex texts, but the syntax of QQL limits these cases.”

From the overview:

“QQL defines a syntax and semantics for writing queries against hierarchical data structures. A query describes a set of data by choosing an initial context in the data and specifying which data are to be returned and how the results are to be organized. An execution engine generates this result by applying the query to the data.”

Examples

Standard Projections

The follow is from chapter 2.1, “Simple Standard Query”:

The following query returns the first and last name of all active people as well as their 10 most recent time entries, reverse-sorted first by last name, then by first name.
Person
{
  select
  {
    FirstName; LastName;
    Sample:= TimeEntries { orderby Date desc; limit 10 }
  }
  where Active
  orderby
  {
    LastName desc;
    FirstName desc;
  }
}

In chapter 2, there are also “2.2 Intermediate Standard Query” and “2.3 Complex Standard Query” examples.

Grouping Projections

The following is from chapter 2.4, “Simple Grouping Query”:

The following query groups active people by last name and returns the age of the youngest person and the maximum contracts for each last name. Results are ordered by the maximum contracts for each group and then by last name.
group Person
{
  groupby LastName;
  select
  {
    default;
    Age:= (Now − BirthDate.Min).Year;
    MaxContracts:= Contracts.Count.Max
  }
  where Active;
  orderby
  {
    MaxContracts desc;
    LastName desc;
  }
}

In chapter 2, there are also “2.5 Complex Grouping Query”, “2.6 Standard Query with Grouping Query” and “2.7 Nested Grouping Queries” examples.

Breaking Changes in C#

2019-01-20T22:19:02+01:00

Published by marco on 20. Jan 2019 22:19:02 (GMT-5)

Updated by marco on 20. Jan 2019 22:20:10 (GMT-5)

Due to the nature of the language, there are some API changes that almost inevitably lead to breaking changes in C#.

Change constructor parameters

While you can easily make another constructor, marking the old one(s) as obsolete, if you use an IOC that allows only a single public constructor, you’re forced to either

remove the obsolete constructor or
mark the obsolete constructor as protected.

In either case, the user has a compile error.

Virtual methods/Interfaces

There are several known issues with introducing new methods or changing existing methods on an existing interface. For many of these situations, there are relatively smooth upgrade paths.

I encountered a situation recently that I thought worth mentioning. I wanted to introduce a new overload on an existing type.

Suppose you have the following method:

bool TryGetValue(
  out T value,
  TKey key = default(TKey),
  [CanBeNull] ILogger logger = null
);

We would like to remove the logger parameter. So we deprecate the method above and declare the new method.

bool TryGetValue(
  out T value,
  TKey key = default(TKey)
);

Now the compiler/ReSharper notifies you that there will be an ambiguity if a caller does not pass a logger. How to resolve this? Well, we can just remove the default value for that parameter in the obsolete method.

bool TryGetValue(
  out T value,
  TKey key = default(TKey),
  [CanBeNull] ILogger logger
);

But now you’ve got another problem: The parameter logger cannot come after the key parameter because it doesn’t have a default value.

So, now you’d have to move the logger parameter in front of the key parameter. This will cause a compile error in clients, which is what we were trying to avoid in the first place.

In this case, we have a couple of sub-optimal options.

Multiple Releases

Use a different name for the new API (e.g. TryGetValueEx à la Windows) in the next major version, then switch the name back in the version after that and finally remove the obsolete member in yet another version.

That is,

in version n, TryGetValue (with logger) is obsolete and users are told to use TryGetValueEx (no logger)
in version n+1, TryGetValueEx (no logger) is obsolete and users are told to use TryGetValue (no logger)
in version n+2, we finally remove TryGetValueEx.

This is a lot of work and requires three upgrades to accomplish. You really need to stay on the ball in order to get this kind of change integrated and it takes a non-trivial amount of time and effort.

We generally don’t use this method, as our customers are developers and can deal with a compile error or two, especially when it’s noted in the release notes and the workaround is fairly obvious (e.g. the logger parameter is just no longer required).

Remove instead of deprecating

Accept that there will be a compile error and soften the landing as much as possible for customers by noting it in the release notes.

Version numbers in .NET Projects

2019-01-20T22:00:30+01:00

Published by marco on 20. Jan 2019 22:00:30 (GMT-5)

Any software product should have a version number. This article will answer the following questions about how Encodo works with them.

How do we choose a version number?
What parts does a version number have?
What do these parts mean?
How do different stakeholders interpret the number?
What conventions exist for choosing numbers?
Who chooses and sets these parts?

Stakeholders

In decreasing order of expected expertise,

Developers: Writes the software; may *change* version numbers
Testers: Tests the software; highly interested in version numbers that make sense
End users: Uses the software as a black box

The intended audience of this document is *developers*.

Definitions and Assumptions

Build servers, not developer desktops, produce artifacts
The source-control system is Git
The quino command-line tool is installed on all machines. This tool can *read* and *write* version numbers for any .NET solution, regardless of which of the many version-numbering methods a given solution actually uses.
A *software library* is a package or product that has a developer as an *end user*
A *breaking change* in a software library causes one of the following
- a build error
- an API to behave differently in a way that cannot be justified as a bug fix

Semantic versions

Encodo uses semantic versions. This scheme has a strict ordering that allows you to determine which version is “newer”. It indicates pre-releases (e.g. alphas, betas, rcs) with a “minus”, as shown below.

Version numbers come in two flavors:

Official releases: [Major].[Minor].[Patch].[Build]
Pre-releases: [Major].[Minor].[Patch]-[Label][Build]

See Microsoft’s NuGet Package Version Reference for more information.

Examples

0.9.0-alpha34: A pre-release of 0.9.0
0.9.0-beta48: A pre-release of 0.9.0
0.9.0.67: An official release of 0.9.0
1.0.0-rc512: A pre-release of 1.0.0
1.0.0.523: An official release of 1.0.0

The numbers are strictly ordered. The first three *parts* indicate the “main” version. The final *part* counts strictly upward.

Parts

The following list describes each of the parts and explains what to expect when it changes.

Build

Identifies the build task that produced the artifact
Strictly increasing

Label

An arbitrary designation for the “type” of pre-release

Patch

Introduces bug fixes but no features or API changes
May introduce obsolete members
May *not* introduce breaking changes

This part is also known as “Maintenance” (see versioning”>Software versioning on Wikipedia).

Minor

Introduces new features that extend existing functionality
May include bug fixes
May cause minor breaking changes
May introduce obsolete members that cause compile errors
Workarounds must be documented in release notes or obsolete messages

Major

Introduces major new features
Introduces breaking changes that require considerable effort to integrate
Introduces a new data or protocol format that requires migration

Conventions

Uniqueness for official releases

There will only ever be one artifact of an official release corresponding to a given “main” version number.

That is, if 1.0.0.523 exists, then there will never be a 1.0.0.524. This is due the fact that the build number (e.g. 524) is purely for auditing.

For example, suppose your software uses a NuGet package with version 1.0.0.523. NuGet will not offer to upgrade to 1.0.0.524.

Pre-release Labels

There are no restrictions on the labels for pre-releases. However, it’s recommended to use one of the following:

alpha
beta
rc

Be aware that if you choose a different label, then it is ordered alphabetically relative to the other pre-releases.

For example, if you were to use the label pre-release to produce the version 0.9.0-prealpha21, then that version is considered to be higher than 0.0.0-alpha34. A tool like NuGet will not see the latter version as an upgrade.

Release branches

The name of a release branch should be the major version of that release. E.g. release/1 for version 1.x.x.x.

Pre-release branches

The name of a pre-release branch should be of the form feature/[label] where [label] is one of the labels recommended above. It’s also OK to use a personal branch to create a pre-release build, as in mvb/[label].

Setting the base version

A developer uses the quino tool to set the version.

For example, to set the version to 1.0.1, execute the following:

quino fix -v 1.0.1.0

The tool will have updated the version number in all relevant files.

Calculating final version

The build server calculates a release’s version number as follows,

major: Taken from solution
minor: Taken from solution
maintenance: Read from solution
label: Taken from the Git branch (see below for details)
build: Provided by the build server

Git Branches

The name of the Git branch determines which kind of release to produce.

If the name of the branch matches the glob **/release/*, then it’s an official release
Everything else is a pre-release

For example,

origin/release/1
origin/production/release/new
origin/release/
release/1
production/release/new
release/

The name of the branch doesn’t influence the version number since an official release doesn’t have a label.

Pre-release labels

The label is taken from the last part of the branch name.

For example,

origin/feature/beta yields beta
origin/feature/rc yields rc
origin/mvb/rc yields rc

The following algorithm ensures that the label can be part of a valid semantic version.

Remove invalid characters
Append an X after a trailing digit
Use X if the label is empty (or becomes empty after having removed invalid characters)

For example,

origin/feature/rc1 yields rc1X
origin/feature/linuxcompat yields linuxcompat
origin/feature/12 yields X

Examples

Assume that,

the version number in the solution is 0.9.0.0
the build counter on the build server is at 522

Then,

Deploying from branch origin/release/1 produces artifacts with version number 0.9.0.522
Deploying from branch origin/feature/rc produces artifacts with version number 0.9.0-rc522

Release Workflow

The following are very concise guides for how to produce artifacts.

Pre-release

Ensure you are on a non-release branch (e.g. feature/rc, master)
Verify or set the base version (e.g. quino fix -v 1.0.2.0
Push any changes to Git
Execute the “deploy” task against your branch on the build server

Release

Ensure you are on a release branch (e.g. release/1)
Verify or set the base version (e.g. quino fix -v 1.0.2.0`)
Push any changes to Git
Execute the “deploy” task against your branch on the build server

v6.0: .NET Standard & Authentication

2019-01-20T21:59:55+01:00

Published by marco on 20. Jan 2019 21:59:55 (GMT-5)

Note: this article was originally published at Encodo.com at the end of October, 2018.

The summary below describes major new features, items of note and breaking changes.

Download the artifacts
See the full list of issues

The links above require a login.

Overview

At long last, Quino enters the world of .NET Standard and .NET Core. Libraries target .NET Standard 2.0, which means they can all be used with any .NET runtime on any .NET platform (e.g. Mac and Linux). Sample applications and testing assemblies target .NET Core 2.0. Tools like quinogenerate and quinofix target .NET Core 2.1 to take advantage of the standardized external tool-support there.

Furthermore, the Windows, Winform and WPF projects have moved to a separate solution/repository called Quino-Windows.

Quino-Standard is the core on which both Quino-Windows and Quino-WebAPI build.

All core assemblies target .NET Standard 2.0.
All assemblies in Quino-Windows target .NET Framework 4.6.2 because that’s the first framework that can interact with .NET Standard (and under which Windows-specific code runs).
All assemblies in Quino-WebAPI currently target .NET Framework 4.6.2. We plan on targeting .NET Core in an upcoming version (tentatively planned for v7).

Highlights

Target .NET Standard and .NET Core from Quino-Standard
Split Windows-specific code to Quino-Windows
Improve authentication API to use IIdentity everywhere (deprecating ICredentials and IUserCredentials).

Breaking Changes

6.0 is a pretty major break from the 5.x release. Although almost all assembly names have stayed the same, we had to move some types around to accommodate targeting .NET Standard with 85% of Quino’s code.

APIs

We’ve tried to support existing code wherever possible, but some compile errors will be unavoidable (e.g. from namespace changes or missing references). In many cases, R#/VS should be able to help repair these errors.

These are the breaking changes that are currently known.

Moved IRunSettings and RunMode from Encodo.Application to Encodo.Core.

References

Any .NET Framework executable that uses assemblies targeting .NET Standard must reference .NET Standard itself. The compiler (MSBuild) in Visual Studio will alert you to add a reference to .NET Standard using NuGet. This applies not just to Winform executables, but also to any unit-test assemblies.

Tools

One piece that has changed significantly is the tool support formerly provided with Quino.Utils. As of version 6, Quino no longer uses NAnt, instead providing dotnet-compatible tools that you can install using common .NET commands. Currently, Quino supports:

dotnet quinofix
dotnet quinogenerate
dotnet quinopack

Please see the tools documentation for more information on how to install and use the new tools.

The standalone Winforms-based tools are in the Quino-Windows download, in the Tools.zip archive.

Quino.Migrator
Quino.PasswordEncryptor

Quino.Utils is no longer supported as a NuGet package.

Learning Quino: a roadmap for documentation and tutorials

2019-01-20T21:59:29+01:00

Published by marco on 20. Jan 2019 21:59:29 (GMT-5)

Note: this article was originally published at Encodo.com in July, 2018.

In recent articles, we outlined a roadmap to .NET Standard and .NET Core and a roadmap for deployment and debugging. These two roadmaps taken together illustrate our plans to extend as much of Quino as possible to other platforms (.NET Standard/Core) and to make development with Quino as convenient as possible (getting/upgrading/debugging).

To round it off, we’ve made good progress on another vital piece of any framework: documentation.

Introducing docs.encodo.ch

We recently set up a new server to host Quino documentation. There, you can find documentation for current releases. Going forward, we’ll also retain documentation for any past releases.

We’re generating our documentation with DocFX, which is the same system that powers Microsoft’s own documentation web site. We’ve integrated documentation-generation as a build step in Quino’s nightly build on TeamCity, so it’s updated every night (Zürich time) [1].

The documentation includes conceptual documentation which provides an overview/tutorials/FAQ for basic concepts in Quino. The API Reference includes comprehensive documentation about the types and methods available in Quino.

Next Steps

While we’re happy to announce that we have publicly available documentation for Quino, we’re aware that we’ve got work to do. The next steps are:

Improve and extend conceptual documentation (QNO-3452, QNO-3453, QNO-3985, QNO-5282, QNO-5283, QNO-5284, QNO-5286, QNO-5391, QNO-5528, QNO-5544, QNO-5562, QNO-5813, QNO-5912 )
Improve the search for API documentation to include members as well as types (QNO-5934)
Integrate documentation for the newly created/separated Quino-WebApi repository (IT-1293)
Retain documentation for the soon-to-be-split-off Quino-Windows repository (QNO-5904)

Even though there’s still work to do, this is a big step in the right direction. We’re very happy to have found DocFX, which is a very comprehensive, fast and nice-looking solution to generating documentation for .NET code. [2]

[1] If the build succeeds, naturally. :-)

[2] We used to use Sandcastle many years ago, but dropped support because it took forever to generate documentation, required its own solution file, didn’t look very nice out-of-the-box, wasn’t so easily customized and didn’t have a very good search (which also didn’t work without an IIS running it).

Delivering Quino: a roadmap for deployment

2019-01-20T21:58:53+01:00

Published by marco on 20. Jan 2019 21:58:53 (GMT-5)

Note: this article was originally published at Encodo.com in July, 2018.

In a recent article, we outlined a roadmap to .NET Standard and .NET Core. We’ve made really good progress on that front: we have a branch of Quino-Standard that targets .NET Standard for class libraries and .NET Core for utilities and tests. So far, we’ve smoke-tested these packages with Quino-WebApi. Our next steps there are to convert Quino-WebApi to .NET Standard and .NET Core as well. We’ll let you know when it’s ready, but progress is steady and promising.

With so much progress on several fronts, we want to address how we get Quino from our servers to our customers and users.

Getting Quino

Currently, we provide access to a private fileshare for customers. They download the NuGet packages for the release they want. They copy these to a local folder and bind it as a NuGet source for their installations.

In order to make a build available to customers, we have to publish that build by deploying it and copying the files to our file share. This process has been streamlined considerably so that it really just involves telling our CI server (TeamCity) to deploy a new release (official or pre-). From there, we download the ZIP and copy it to the fileshare.

Encodo developers don’t have to use the fileshare because we can pull packages directly from TeamCity as soon as they’re available. This is a much more comfortable experience and feels much more like working with nuget.org directly.

Debugging Quino

The debugging story with external code in .NET is much better than it used to be (spoiler: it was almost impossible, even with Microsoft sources), but it’s not as smooth as it should be. This is mostly because NuGet started out as a packaging mechanism for binary dependencies published by vendors with proprietary/commerical products. It’s only in recent year(s) that packages are predominantly open-source.

In fact, debugging with third-party sources—even without NuGet involved—has never been easy with .NET/Visual Studio.

Currently, all Quino developers must download the sources separately (also available from TeamCity or the file-share) in order to use source-level debugging.

Binding these sources to the debugger is relatively straightforward but cumbersome. Binding these sources to ReSharper is even more cumbersome and somewhat unreliable, to boot. I’ve created the issue Add an option to let the user search for external sources explicitly (as with the VS debugger) when navigating in the hopes that this will improve in a future version. JetBrains has already fixed one of my issues in this are (Navigate to interface/enum/non-method symbol in Nuget-package assembly does not use external sources), so I’m hopeful that they’ll appreciate this suggestion, as well.

The use case I cited in the issue above is,

Developers using NuGet packages that include sources or for which sources are available want to set breakpoints in third-party source code. Ideally, a developer would be able to use R# to navigate through these sources (e.g. via F12) to drill down into the code and set a breakpoint that will actually be triggered in the debugger.

As it is, navigation in these sources is so spotty that you often end up in decompiled code and are forced to use the file-explorer in Windows to find the file and then drag/drop it to Visual Studio where you can set a breakpoint that will work.

The gist of the solution I propose is to have R# ask the user where missing sources are before decompiling (as the Visual Studio debugger does).

Nuget Protocol v3 to the rescue?

There is hope on the horizon, though: Nuget is going to address the debugging/symbols/sources workflow in an upcoming release. The overview is at NuGet Package Debugging & Symbols Improvements and the issue is Improve NuGet package debugging and symbols experience.

Once this feature lands, Visual Studio will offer seamless support for debugging packages hosted on nuget.org. Since we’re using TeamCity to host our packages, we need JetBrains to Add support for NuGet Server API v3 [1] in order to benefit from the improved experience. Currently, our customers are out of luck even if JetBrains releases simultaneously (because our TeamCity is not available publicly).

Quino goes public?

I’ve created an issue for Quino, Make Quino Nuget packages available publicly to track our progress in providing Quino packages to our customers in a more convenient way that also benefits from improvements to the debugging workflow with Nuget Packages.

If we published Quino packages to NuGet (or MyGet, which allows private packages), then we would have the benefit of the latest Nuget protocol/improvements for both ourselves and our customers as soon as it’s available. Alternatively, we could also proxy our TeamCity feed publicly. We’re still considering our options there.

As you can see, we’re always thinking about the development experience for both our developers and our customers. We’re fine-tuning on several fronts to make developing and debugging with Quino a seamless experience for all developers on all platforms.

We’ll keep you posted.

[1] This issue has been closed and released with R# in the latest versions.

Removing unwanted references to .NET 4.6.1 from web applications

2019-01-20T21:55:36+01:00

Published by marco on 20. Jan 2019 21:55:36 (GMT-5)

Note: this article was originally published at Encodo.com in July, 2018.

The title is a bit specific for this blog post, but that’s the gist of it: we ended up with a bunch of references to an in-between version of .NET (4.6.1) that was falsely advertising itself as a more optimal candidate for satisfying 4.6.2 dependencies. This is a known issue; there are several links to MS GitHub issues below.

In this blog, I will discuss direct vs. transient dependencies as well as internal vs. runtime dependencies.

tl;dr

If you’ve run into problems with an application targeted to .NET Framework 4.6.2 that does not compile on certain machines, it’s possible that the binding redirects Visual Studio has generated for you use versions of assemblies that aren’t installed anywhere but on a machine with Visual Studio installed.

How I solved this issue:

Remove the C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\MSBuild\Microsoft\Microsoft.NET.Build.Extensions\net461\ directory
Remove all System* binding redirects
Clean out all bin/ and obj/ folders
Delete the .vs folder (may not be strictly necessary)
Build in Visual Studio
Observe that a few binding-redirect warnings appear
Double-click them to re-add the binding redirects, but this time to actual 4.6.2 versions (you may need to add true to your project)
Rebuild and verify that you have no more warnings

The product should now run locally and on other machines.

For more details, background and the story of how I ran into and solved this problem, read on.

Note: I published a recent article, .NET Tips and Resources, containing a link to a video by Immo Landwerth, in which says “If you want to be compatible with .NET Core 1.5 or lower, then you can use .NET Framework 4.6.1. For .NET Standard compatibility, you should definitely use .NET Framework 4.7.2 instead.” That will probably fix the problem as well. Moving to .NET Core will also fix the problem, as all binding is handled automatically there.

Building Software

What do we mean when we say that we “build” an application?

Building is the process of taking a set of inputs and producing an artifact targeted at a certain runtime. Some of these inputs are included directly while others are linked externally.

Examples of direct inputs are the binary artifacts produced from the source code that comprises your application
Examples of external inputs are OS components and runtime environments

The machine does exactly what you tell it to, so it’s up to you to make sure that your instructions are as precise as possible. However, you also want your application to be flexible so that it can run on as wide an array of environments as possible.

Your source code consists of declarations. We’ve generally got the direct inputs under control. The code compiles and produces artifacts as expected. It’s the external-input declarations where things go awry.

What kind of external inputs does our application have?

System dependencies in the runtime target (assemblies like System.Runtime, System.Data, etc.), each with a minimum version
Third-party dependencies pulled via NuGet, each with a minimum version

How is this stitched together to produce the application that is executed?

The output folder contains our application, our own libraries and the assemblies from NuGet dependencies
All other dependencies (e.g. system dependencies) are pulled from the environment

The NuGet dependencies are resolved at build time. All resources are pulled and added to the release on the build machine. There are no run-time decisions to make about which versions of which assemblies to use.

Dependencies come in two flavors:

Direct: A reference in the project itself
Transient: A direct reference inherited from another direct or transient reference

It is with the transient references that we run into issues. The following situations can occur:

A transient dependency is referenced one or more times with the same version. This is no problem, as the builder simply uses that version or substitutes a newer version if that version is no longer available (rare, but possible)
A transient dependency is referenced in different versions. In this case, the builder tries to substitute a single version for all requirements. This generally works OK since most dependencies require a given version or higher. It may be that one or another library cannot work with all newer versions, but this is also rare. In this case, the top-level assembly (the application) must include a hint (an assembly-binding redirect) that indicates that the substitution is OK. More on these below.
A transient dependency requires a lower version than the version that is directly referenced. This is also not a problem, as the transient dependency is satisfied by the direct dependency with the higher version. In this case, the top-level application must also include an assembly-binding redirect to allow the substitution without warning.
A transient dependency requires a higher version than the version that is directly referenced. This is an error (no longer just a warning) that must be solved by either downgrading the dependency that leads to the problematic transient dependency or upgrading the direct dependency. Generally, the application will upgrade the direct dependency.

Assembly-Binding Redirects

An application generally includes an app.config (desktop applications or services) or web.config XML file that includes a section where binding redirects are listed. A binding redirect indicates the range of versions that can be mapped (or redirected) to a certain fixed version (which is generally also included as a direct dependency).

A redirect looks like this (a more-complete form is further below):

When the direct dependency is updated, the binding redirect must be updated as well (generally by updating the maximum version number in the range and the version number of the target of the redirect). NuGet does this for you when you’re using package.config. If you’re using Package References, you must update these manually. This situation is currently not so good, as it increases the likelihood that your binding redirects remain too restrictive.

NuGet Packages

NuGet packages are resolved at build time. These dependencies are delivered as part of the deployment. If they could be resolved on the build machine, then they are unlikely to cause issues on the deployment machine.

System Dependencies

Where the trouble comes in is with dependencies that are resolved at execution time rather than build time. The .NET Framework assemblies are resolved in this manner. That is, an application that targets .NET Framework expects certain versions of certain assemblies to be available on the deployment machine.

We mentioned above that the algorithm sometimes chooses the desired version or higher. This is not the case for dependencies that are in the assembly-binding redirects. Adding an explicit redirect locks the version that can be used.

This is generally a good idea as it increases the likelihood that the application will only run in a deployment environment that is extremely close or identical to the development, building or testing environment.

Aside: Other Bundling Strategies

How can we avoid these pesky run-time dependencies? There are several ways that people have come up with, in increasing order of flexibility:

Deliver hardware and software together. This is common in industrial applications and used to be much more common for businesses, as well. Nearly bulletproof. If it worked in the factory, it will work for the customer.
Deliver a VM (virtual machine) as your application. This includes the entire execution environment right down to the hardware. Safe, but inefficient.
Use a container (e.g. Docker) to deliver a description of the execution environment. The image is built to match the declaration. This is also quite stable and can avoid many of the substitution errors outlined above. If components are outdated, the machine fails to start and the definition must first be updated (and, presumably, tested). This type of deployment is getting more reliable but is also overkill for many applications.
Deliver the runtime with the application instead of describing the runtime you’d like to have. Targeting .NET Core instead of .NET Framework includes the runtime. This seems like a nice alternative and it’s not surprising that Microsoft went in this direction with .NET Core. It’s a good solution to the external-dependency issues outlined above.

To sum up:

A VM delivers the OS, runtime and application.
A Container delivers a description of the OS and runtime as well as the application itself.
.NET Core includes the runtime and application and is OS-agnostic (within reason).
.NET Framework includes only the application and some directives on the remaining components to obtain from the runtime environment.

Our application targets .NET Framework (for now). We’re looking into .NET Core, but aren’t ready to take that step yet.

Where can the deployment go wrong?

To sum up the information from above, problems arise when the build machine contains components that are not available on the deployment machine.

How can this happen? Won’t the deployment machine just use the best match for the directives included in the build?

Ordinarily, it would. However, if you remember our discussion of assembly-binding redirects above, those are set in stone. What if you included binding redirects that required versions of system dependencies that are only available on your build machine … or even your developer machine?

Special Tip for Web Applications

We actually discovered an issue in our deployment because the API server was running, but the Authentication server was not. The Authentication server was crashing because it couldn’t find the runtime it needed in order to compile its Razor views (it has ASP.Net MVC components). We only discovered this issue on the deployment server because the views were only ever compiled on-the-fly.

To catch these errors earlier in the deployment process, you can enable pre-compiling views in release mode so that the build server will fail to compile instead of a producing a build that will sometimes fail to run.

Add the true to any MVC projects in the PropertyGroup for the release build, as shown in the example below:


  pdbonly
  true
  bin
  TRACE
  prompt
  4
  6
  true

How do I create a redirect?

We mentioned above that NuGet is capable of updating these redirects when the target version changes. An example is shown below. As you can see, they’re not very easy to write:

Most bindings are created automatically when MSBuild emits a warning that one would be required in order to avoid potential runtime errors. If you compile with MSBuild in Visual Studio, the warning indicates that you can double-click the warning to automatically generate a binding.

If the warning doesn’t indicate this, then it will tell you that you should add the following to your project file:

true

After that, you can rebuild to show the new warning, double-click it and generate your assembly-binding redirect.

How did we get the wrong redirects?

When MSBuild generates a redirect, it uses the highest version of the dependency that it found on the build machine. In most cases, this will be the developer machine. A developer machine tends to have more versions of the runtime targets installed than either the build or the deployment machine.

A Visual Studio installation, in particular, includes myriad runtime targets, including many that you’re not using or targeting. These are available to MSBuild but are ordinarily ignored in favor of more appropriate ones.

That is, unless there’s a bit of a bug in one or more of the assemblies included with one of the SDKs…as there is with the net461 distribution in Visual Studio 2017.

Even if you are targeting .NET Framework 4.6.2, MSBuild will still sometimes reference assemblies from the 461 distribution because the assemblies are incorrectly marked as having a higher version than those in 4.6.2 and are taken first.

I found the following resources somewhat useful in explaining the problem (though none really offer a solution):

How can you fix the problem if you’re affected?

You’ll generally have a crash on the deployment server that indicates a certain assembly could not be loaded (e.g. System.Runtime). If you show the properties for that reference in your web application, do you see the path C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\MSBuild\Microsoft\Microsoft.NET.Build.Extensions\net461 somewhere in there? If so, then your build machine is linking in references to this incorrect version. If you let MSBuild generate binding redirects with those referenced paths, they will refer to versions of runtime components that do not generally exist on a deployment machine.

Tips for cleaning up:

Use MSBuild to debug this problem. R# Build is nice, but not as good as MSBuild for this task.
Clean and Rebuild to force all warnings
Check your output carefully.
- Do you see warnings related to package conflicts?
- Ambiguities?
- Do you see the path C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\MSBuild\Microsoft\Microsoft.NET.Build.Extensions\net461 in the output?

A sample warning message:

[ResolvePackageFileConflicts] Encountered conflict between Platform:System.Collections.dll and

CopyLocal:C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\MSBuild\Microsoft\Microsoft.NET.Build.Extensions\net461\lib\System.Collections.dll

. Choosing

CopyLocal:C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\MSBuild\Microsoft\Microsoft.NET.Build.Extensions\net461\lib\System.Collections.dll

because AssemblyVersion 4.0.11.0 is greater than 4.0.10.0.

The Solution

As mentioned above, but reiterated here, this what I did to finally stabilize my applications:

Remove the C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\MSBuild\Microsoft\Microsoft.NET.Build.Extensions\net461\ directory
Remove all System* binding redirects
Clean out all bin/ and obj/ folders
Delete the .vs folder (may not be strictly necessary)
Build in Visual Studio
Observe that a few binding-redirect warnings appear
Double-click them to re-add the binding redirects, but this time to actual 4.6.2 versions (you may need to add true to your project)
Rebuild and verify that you have no more warnings
Deploy and TADA!

One more thing

When you install any update of Visual Studio, it will silently repair these missing files for you. So be aware and check the folder after any installations or upgrades to make sure that the problem doesn’t creep up on you again.

Quino's Roadmap to .NET Standard and .NET Core

2019-01-20T21:49:13+01:00

Published by marco on 20. Jan 2019 21:49:13 (GMT-5)

Note: this article was originally published at Encodo.com in May, 2018.

With Quino 5, we’ve gotten to a pretty good place organizationally. Dependencies are well-separated into projects—and there are almost 150 of them.

We can use code-coverage, solution-wide-analysis and so on without a problem. TeamCity runs the ~10,000 tests quickly enough to provide feedback in a reasonable time. The tests run even more quickly on our desktops. It’s a pretty comfortable and efficient experience, overall.

Monolithic Solution: Pros and Cons

As of Quino 5, all Quino-related code was still in one repository and included in a single solution file. Luckily for us, Visual Studio 2017 (and Rider and Visual Studio for Mac) were able to keep up quite well with such a large solution. Recent improvements to performance kept the experience quite comfortable on a reasonably equipped developer machine.

Having everything in one place is both an advantage and disadvantage: when we make adjustments to low-level shared code, the refactoring is applied in all dependent components, automatically. If it’s not 100% automatic, at least we know where we need to make changes in dependent components. This provides immediate feedback on any API changes, letting us fine-tune and adjust until the API is appropriate for known use cases.

On the other hand, having everything in one place means that you must make sure that your API not only works for but compiles and tests against components that you may not immediately be interested in.

For example, we’ve been pushing much harder on the web front lately. Changes we make in the web components (or in the underlying Quino core) must also work immediately for dependent Winform and WPF components. Otherwise, the solution doesn’t compile and tests fail.

While this setup had its benefits, the drawbacks were becoming more painful. We wanted to be able to work on one platform without worrying about all of the others.

On top of that, all code in one place is no longer possible with cross-platform support. Some code—Winform and WPF—doesn’t run on Mac or Linux. [1]

The time had come to separate Quino into a few larger repositories.

Separate Solutions

We decided to split along platform-specific lines.

Quino-Standard: all common code, including base libraries, application, configuration and IOC support, metadata, builders and all data drivers
Quino-WebApi: all web-related code, including remaining ASP.NET MVC support
Quino-Windows: all Windows-platform-only code (Windows-only APIs (i.e. native code) as well as Winform and WPF)

The Quino-WebApi and Quino-Windows solution will consume Quino-Standard via NuGet packages, just like any other Quino-based product. And, just like any Quino-based product, they will be able to choose when to upgrade to a newer version of Quino-Standard.

Quino-Standard

Part of the motivation for the split is cross-platform support. The goal is to target all assemblies in Quino-Standard to .NET Standard 2.0. The large core of Quino will be available on all platforms supported by .NET Core 2.0 and higher.

This work is quite far along and we expect to complete it by August 2018.

Quino-WebApi

As of Quino 5.0.5, we’ve moved web-based code to its own repository and set up a parallel deployment for it. Currently, the assemblies still target .NET Framework, but the goal here is to target class libraries to .NET Standard and to use .NET Core for all tests and sample web projects.

We expect to complete this work by August 2018 as well.

Quino-Windows

We will be moving all Winform and WPF code to its own repository, setting it up with its own deployment (as we did with Quino-WebApi). These projects will remain targeted to .NET Framework 4.6.2 (the lowest version that supports interop with .NET Standard assemblies).

We expect this work to be completed by July 2018.

Quino-Mobile

One goal we have with this change is to be able to use Quino code from Xamarin projects. Any support we build for mobile projects will proceed in a separate repository from the very beginning.

We’ll keep you posted on work and improvements and news in this area.

Conclusion

Customer will, for the most part, not notice this change, except in minor version numbers. Core and platform versions may (and almost certainly will) diverge between major versions. For major versions, we plan to ship all platforms with a single version number.

[1] I know, Winform can be made to run on Mac using Mono. And WPF may eventually become a target of Xamarin. But a large part of our Winform UI uses the Developer Express components, which aren’t going to run on a Mac. And the plans for WPF on Mac/Linux are still quite up in the air right now.

Convert Markdown to earthli format

2019-01-20T21:44:30+01:00

Published by marco on 20. Jan 2019 21:44:30 (GMT-5)

The earthli blogging format uses HTML-like formatting, described in the lengthy manual (with examples). However, Encodo’s blogging back-end now uses Umbraco, with Markdown for content. I used to be able to cross-post with ease, by copy/pasting. Now, I need to convert the content from Markdown to earthli formatting.

The following steps suffice to convert any article:

If there are attached media (e.g. graphics), save those locally
Create a new earthli article with the same title as the source article
Attach all media to the new article
Copy the main text; paste into Sublime Text
Find/replace the following regular expressions with the replacements:
- ### ([^\n]+)$ => \1
- ## ([^\n]+)$ => \1
- \[([^!][^\]]+)\]$([^$]+)\) => \1
- \*\*([^\*]+)\*\* => \1
- _([^_]+)_ => \1
- ```txt\n([^`]+)\n``` => \1
- ```[a-z]+\n([^`]+)\n``` => \1
- `([^`]+)` => \1
Manually re-attach all media in the appropriate locations [1]

I haven’t automated this process yet because I only rarely transfer articles.

[1] This is easy since the earthli UI includes an attachment formatter that lets you place, align, link and scale the attachment better than in the original specification anyway.

Compiler Pessimism

2019-01-08T22:46:23+01:00

Published by marco on 8. Jan 2019 22:46:23 (GMT-5)

Updated by marco on 20. Jan 2019 11:22:48 (GMT-5)

“In practice, nearly everything you write is potentially dependent upon the order of evaluation, but in practice it isn’t because you are not a nincompoop.”

—Raymond Chen (The Old New Thing)

He completes the thought with “[b]ut the compiler doesn’t know that. The compiler must adhere to the letter of the language standard, because it has to compile insane code as well as sane code.”

Inventing languages for the sake of it

2019-01-08T22:28:18+01:00

Published by marco on 8. Jan 2019 22:28:18 (GMT-5)

The article Fear, trust and JavaScript: When types and functional programming fail presents issues in JavaScript and a solution: use another language. The list several newer ones that are completely untested.

But the main problem that the article mentions can’t be solved 100% by any language. The main problem is at the boundaries of your application: inputs.

When you get data from an external source, you have to validate it somehow before passing it along to the rest of the application.

No language can remove this requirement. It doesn’t matter how functional, curryable, immutable or sexy it is; it just can’t do it. What you have instead is languages with more built-in mechanisms for defining types that allow the rest of the program to work safely with the data, once it’s been validated.

So if your language supports immutability and types, then you can validate that the data is OK before hydrating the object from the serialized source (e.g. JSON).

What we’re trying to avoid is unexpected runtime errors, no? Or, at the very least, we want a runtime error of a known type that precisely identifies the problem with the incoming data. That is, the data either conforms to the definition—and the definition is statically typed—or there is an error.

The desire is to push this gatekeeper/conversion to a single place so that the rest of the application works with the compiler to find errors rather than tyhe programmer defensively checking throughout the source.

However, suggesting that PureScript or Elm or ClojureScript are somehow better at doing this the JavaScript is incorrect. Where they are better is in providing language mechanisms that allow you to precisely define the shape of the data.

Despite the author’s suggestions, they are not that much different than TypeScript. The only difference being that TypeScript chose to stay much closer to JavaScript for compatibility reasons. At the time that TypeScript came out, this was a reasonable requirement, since almost no-one wanted to move completely away from JavaScript.

Five years later and the development world is ready for other languages. With WASM (Web Assembly) as a target (instead of just JavaScript), there are more possibilities than ever.

JavaScript as a compile target is still open to runtime errors. When you use a higher-level language, you’re restricting the range of functionality that you can use in the target bytecode/machine code. That is, when you write an if-statement in C, you’re using the JMP statement, but you’re only able to JMP to certain address locations instead of anywhere in addressable memory.

It’s the same with JavaScript as a compile target. It doesn’t really matter that JavaScript allows too much—what matters is what the higher-level language allows. TypeScript may still allow too much, but it’s worlds better than JavaScript.

It’s true that PureScript or Elm or ClojureScript can close some loopholes that TypeScript leaves open. That’s fine. But if you’re going to just use JavaScript (or WASM) as a compile target, then why not choose a more-established language like C# or F#?

Anyone Can Be a Programmer, Right?

2019-01-08T22:24:05+01:00

Published by marco on 8. Jan 2019 22:24:05 (GMT-5)

The post on Reddit called Someone asked me to make a site for them and I don’t know how the fuck I’m supposed to go about it. is about exactly what it sounds like it’s about. Amid the flurry of comments with recommendations on how to pretend he (or she) knows how to build a web site by using tools he’s (or she’s) never heard of, I chimed in with,

What is it about software that makes people who have never done it think that they can do it professionally?

What if your neighbor had heard you were a carpenter and had asked you to make a dining-room set for “good money”? Would you watch YouTube videos about how to make furniture and then charge money for the first furniture you ever made?

What about if they’d asked for a haircut/trim/style/dye? Would you just go for it, after having asked around on /r/coiffeur for a few minutes?

Or maybe they’d heard you were a chef and offered “good money” to cook their Thanksgiving dinner for them? Would you risk doing that?

Probably not, because if you’ve never done any of those things, you’re not good at them and charging for doing them can only backfire horribly.

Unless your neighbor is a sap and a fool, in which case go for it.

Ray-tracing on postcards

2018-12-31T22:55:26+01:00

Published by marco on 31. Dec 2018 22:55:26 (GMT-5)

The article Deciphering The Postcard Sized Raytracer by Fabien Sanglard is a wonderfully presented breakdown of how the path tracer found on a postcard does its magic. It’s not super-fast (it takes 3 minutes to produce a much rougher version on the author’s machine). He includes his final cleaned-up source code.

It comes from the same person who made the business card ray-tracer discussed in the article Decyphering The Business Card Raytracer by Fabien Sanglard.

The problem with slow development tools

2018-12-30T23:03:19+01:00

Published by marco on 30. Dec 2018 23:03:19 (GMT-5)

Updated by marco on 30. Dec 2018 23:03:37 (GMT-5)

The article ”Modern” C++ Lamentations by Aras Pranckeviciusis a wide-ranging rant about the inefficiency of C++ template programming and the degree to which it’s inappropriate for many of the areas where C++ is used. Aras is one of the developers for the Unity game engine

In particular, he highlights the disastrous compilation and execution speeds when using a lot of the STL. Not only that, but the debugging time is extremely slow, due to the inordinate amount of extra symbol information associated with hundreds of thousands of lines of code pulled in to implement relatively simple concepts that are standard in other languages, libraries and runtimes.

On top of it all, even the high-level C++ code isn’t very easy to read, despite the tremendous amount of abstraction.

The optimized version of C++ code has an even worse compilation time, but it has a comparable/reasonable run-time to the C/C++-style version. However, it’s very difficult to debug optimized code, which makes it doubly bad for development. Interactive development is hindered because of long compile times and, when debugging is necessary, most introspection tools don’t work (e.g. reading variables) very well. It’s the rare developer who can make headway debugging optimized code.

He compares versions of an algorithm built using “classic” C/C++ programming vs. STL programming. He then compares to C#, which compiles and runs and debugs very quickly—and is very easy to read, to boot.

The problem with C++ boils down to its approach of making “everything a library”. It’s almost like an exercise in abstraction: since a few generic-programming concepts can be used to build everything in the library rather than the language, that’s what C++ does. It’s almost as if it does it to prove that it can be done. I’m all for removing redundancy in a language, but C++ is far from such a language. It’s almost like the designers don’t use their own language.

He cites Christer Ericson (Twitter)

“Goal of programmers is to ship, on time, on budget. It’s not “to produce code.” IMO most modern C++ proponents 1) overassign importance to source code over 2) compile times, debug[g]ability, cognitive load for new concepts and extra complexity, project needs, etc. 2 is what matters.”

Aras continues discussing the future of C++ and how it is currently used in game companies, for example. These are the companies using C++ the most. Rust is making some inroads, but the area is dominated by C/C++.

Finally, he has some good advice for programmers—for any professional, really—on how to take criticism and turn it into something useful.

“Ignoring literal trolls who complain on the internet “just for the lulz”, [the] majority of complaints do have [an] actual issue or problem behind it. It might be worded poorly, or exaggerated, or whoever is complaining did not think about other possible viewpoints, but there is a valid issue behind the complaint anyway.

“What I do whenever someone complains about thing I’ve worked on, is try to forget about “me” and “work I did”, and get their point of view. What are they trying to solve, and what problems do they run into? The purpose of any software/library/language is to help their users solve the problems they have. It might be a perfect tool at solving their problem, an “ok I guess that will work” one, or a terribly bad one at that.”

As a postscript, the article It is fast or it is wrong by Nikita Tonsky discusses a very similar issue with Clojure vs. ClojureScript.

“What do ClojureScript/Google Closure compilers do for so long? They are wasting your time, that’s what. Of course it’s nobody’s fault, but in the end, this whole solution is simply wrong. We can do the same thing much faster, we have proof of that, we have the means to do it, it just happens that we are not. But we could. If we wanted to. That huge overhead you’re paying, you’re paying it for nothing. You don’t get anything from being on JS, except a 2× performance hit and astronomical build times.”

I find these points interesting because programming is very much about which tools you use and how they help you to turn your work around more quickly. I’m in charge of choosing which languages, libraries and tools we use at Encodo and I’m hyper-aware of the efficiency losses when developers are hindered by their tools or libraries. Being the lead developer of our framework Quino makes me doubly aware of this.

If you have a very slow feedback loop, then you’ll take much longer to get your work done. I remember back in the late 90s/early 2000s, working with C++, where I would have to schedule builds because it took over 30 minutes to rebuild all of my static libraries if I made a low-level change. This was on a project that cross-compiled to Mac and Windows. Instead of working on my project, I spent way too much time massaging PCH files and avoiding making low-level changes so that I could continue testing.

Bad tools that run too slowly are a problem. That’s why you should always be very careful in choosing your languages, libraries and environments. Jumping ship to the “new hotness” very often means that you’re going to have your time wasted by tools that aren’t ready for prime time.

.NET Tips and Resources

2018-12-30T22:12:44+01:00

Published by marco on 30. Dec 2018 22:12:44 (GMT-5)

If you’re a .NET developer, this is video you’ve been looking for:

S107 − Build great libraries using .NET Standard by Immo Landwerth (YouTube)

Immo tells you everything you need to know about Nuget, using Package References, switching to .NET Core, and using Assembly-Binding Redirects in .NET Framework (they’re not necessary in .NET Core). He also includes an effusive apology for the nightmare of compatibility issues that accompanied the purported interoperability between .NET 4.6.1 and .NET Core.

If you want to be compatible with .NET Core 1.5 or lower, then you can use .NET Framework 4.6.1. For .NET Standard compatibility, you should definitely use .NET Framework 4.7.2 instead.

He includes a list of resources for digging through open-source code and checking platform and target compatibility.

.NET API Catalog

While you can use Microsoft Docs to find out which targets or platforms support which APIs, this resource lets you do it faster.

You can browse a giant list of namespaces and click on any one of them to see the types, and then drill down to properties and methods. For each level, you can see a nice list of supported targets/platforms and the assemblies to use.

You can also “Search”, which opens what looks like a terminal that let’s you camel-case search for your namespace, type or member. Selecting a result takes you to the location in the catalog.

.NET Framework Source-code browser

Yes, you read that correctly. I had no idea that this existed—I’ve been digging through decompiled assembly code instead. This is much faster and includes the original documentation and comments. The source is syntax-highlighted and all types, methods and properties are linked.

There’s a document explorer, namespace explorer and project manager, all linked up very nicely. You can click any element and show all references in a separate pane. Clicking one of those references navigates there—and other references in that file are also highlighted.

If that’s not sufficient, you can even download the entire source code as a ZIP file from here—complete with solution and project files so you can open it in Visual Studio for browsing.

.NET Core Source Browser

This is the same thing as the link above, but for .NET Core sources.

FuGet

This is a NuGet package browser combined with an API browser over all of the assemblies in a package.

You can search the package for keywords.
You can browse types and see the formatted XML documentation.
You can diff versions of the package to see which APIs have changed.
You can jump to the code in the Git repository, if it’s given in the package.

It’s an open-source GitHub project, so you could even run your own copy for diffing privately published packages.

Which type should you register in an IOC container?

2018-07-16T21:55:42+02:00

Published by marco on 16. Jul 2018 21:55:42 (GMT-5)

Use Case

I just ran into an issue recently where a concrete implementation registered as a singleton was suddenly not registered as a singleton because of architectural changes.

The changes involved creating mini-applications within a main application, each of which has its own IOC. Instead of creating controllers using the main application, I was now creating controllers with the mini-application instead (to support multi-tenancy, of which more in an upcoming post).

Silent Replacement of Singleton with Transient

Controllers are, by their nature, transient; a new controller is created to handle each incoming request.

In the original architecture, the concrete singleton was injected into the controller and all controller instances used the same shared instance. In the new architecture, the registration was not present in the mini-application (at first), which led to a (relatively) subtle bug: a transient and freshly created instance was injected into each new controller.

In cases where the singleton is a stateless algorithm, this wouldn’t be a logical problem at all. At the very worst, you’re over-allocating—but you probably wouldn’t notice that, either. In this case, the singleton was a settings object, configured at application startup. The configured object was still in the main application’s IOC, but not registered in the mini-application’s IOC.

Because the singleton was registered on a concrete type rather than an interface, the semantic error occurred silently instead of throwing a lifestyle-mismatch or unregistered-interface exception.

A Straightforward Fix

This is only one of the reasons that I recommend using interfaces as the anchoring type of an IOC registration.

To fix the issue, I did exactly this: I extracted an interface from the class and used the interface everywhere (except for the implementing type of the registration). Re-running the test caused an immediate exception rather than a strange data bug (which resulted because the default configuration in the concrete type was just correct enough to allow it to limp to a result).

To show an example, instead of the following,

application.RegisterSingle()

I used,

application.RegisterSingle<IApiSettings, ApiSettings>()

This still didn’t fix the crash because the mini-application doesn’t get that registration automatically.

I also can’t use the same registration as above because that would just create a new unconfigured ApiSettings in each mini-application (the same as I had before, but now as a singleton). To go that route, I would have to replicate the configuration-loading for the ApiSettings as well. And I don’t want to do that.

Instead, I just injected the IApiSettings from the main application to the component responsible for creating the mini-application and registered the object as a singleton directly, as shown below.

public class MiniApplicationFactory
{
  public MiniApplicationFactory([NotNull] IApiSettings apiSettings)
  {
    if (apiSettings = null) { throw new ArgumentNullException(nameof(apiSettings(); }

    _apiSettings = apiSettings;
  }

  IApplication CreateApplication()
  {
    return new Application().UseRegisterSingle(_apiSettings);
  }

  [NotNull]
  private readonly IApiSettings _apiSettings;
}

On a side note, whereas C# syntax has become more concise and powerful from version to version, I still think it has a way to go in terms of terseness for such simple objects. For such things, Kotlin and TypeScript nicely illustrate what such a syntax could look like. [1]

Other Drawbacks

I mentioned above that this is only “one” of the reasons I don’t like registering concrete singletons. The other two reasons are:

Complicates replacement: If the registered type is a concrete instance, then any replacement must inherit from this instance. The base class has to be constructed more carefully in order to allow for all foreseeable customizations. With an interface, the implementor is completely free to either use the existing class as a base or to re-implement the interface entirely.
Limits Mocking: Related to the first reason is that mocking is limited in its ability to override non-virtual methods. Even without a mocking library, you’re just as hard-pressed to work around unwanted behavior in a hand-coded mock as you are with an actual replacement (as described above). Such limitations are non-existent with interfaces.

[1]

I’m still waiting for C# to clean up a bit more of this syntax for me. The [NotNull] should be a language feature checked by the compiler so that the ArgumentNullException is no longer needed. On top of that, I’d like to see parameter properties, as in TypeScript (this is where you can prefix a constructor parameter with a keyword to declare and initialize it as a property). With a few more C#-language iterations that included non-nullable reference types and parameter properties, the example could look like the code below:

public class MiniApplicationFactory
{
  public MiniApplicationFactory(private IApiSettings apiSettings)
  {
  }

  IApplication CreateApplication()
  {
    return new Application().UseRegistereSingle(apiSettings);
  }
}

Tools for maintaining Quino

2018-05-24T22:12:33+02:00

Published by marco on 24. May 2018 22:12:33 (GMT-5)

The Quino roadmap shows you where we’re headed. How do we plan to get there?

A few years back, we made a big leap in Quino 2.0 to split up dependencies in anticipation of the initial release of .NET Core. Three tools were indispensable: ReSharper, NDepend and, of course, Visual Studio. Almost all .NET developers use Visual Studio, many use ReSharper and most should have at least heard of NDepend.

At the time, I wrote a series of articles on the migration from two monolithic assemblies (Encodo and Quino) to dozens of layered and task-specific assemblies that allows applications to include our software in a much more fine-grained manner. As you can see from the articles, NDepend was the main tool I used for finding and tracking dependencies. [1] I used ReSharper to disentangle them.

Since then, I’ve not taken advantage of NDepend’s features for maintaining architecture as much as I’d like. I recently fired it up again to see where Quino stands now, with 5.0 in beta.

But, first, let’s think about why we’re using yet another tool for examining our code. Since I started using NDepend, other tools have improved their support for helping a developer maintain code quality.

ReSharper itself has introduced tools for visualizing project and type dependencies with very nice graphs. However, there is currently no support for establishing boundaries and getting ReSharper to tell me when I’ve inadvertently introduced new dependencies. In fact, ReSharper’s only improved its support for quickly pulling in a dependency with its excellent Nuget-Package integration. ReSharper is excellent for finding lower-level code smells, like formatting, style and null-reference issues, as well as language usage, missing documentation and code-complexity (with an extension). DotCover provides test-coverage data but I haven’t used it for real-time analysis yet (I don’t use continuous testing with ReSharper on Quino because I feel it would destroy my desktop).
Visual Studio has also been playing catch-up with ReSharper and has done an excellent job in the last couple of years. VS 2017 is much, much faster than its predecessors; without it, we would be foundering badly with a Quino solution with almost 150 projects. [2] Visual Studio provides Code Analysis and Portability Analysis and can calculate Code Metrics. Code Analysis is mostly covered by ReSharper, although it has a few extra inspections related to proper application and usage of the IDisposable pattern. The Portability Analysis is essential for moving libraries to .NET Standard but doesn’t offer any insight into architectural violations like NDepend does.
We’ve recently started working with SonarQube on our TeamCity build server because a customer wanted to use it. It has a very nice UI and very nice reports, but doesn’t go much farther than VS/R# inspections. Also, the report isn’t in the UI, so it’s not as quick to jump into the code. I don’t want to review it here, since we only recently started working with it. It looks promising and is a welcome addition to that project. Hopefully more will reveal itself in time.
TeamCity provides a lot of the services that ReSharper also provides: inspections and code-coverage for builds. This takes quite a while, though, so we only run inspections and coverage for the Quino nightly build. The reports are nice but, as with SonarQube, of limited use because of the tenuous integration with Visual Studio. The integration works, but it’s balky and we don’t use it very much. Instead, we analyze inspections in real-time in Visual Studio with ReSharper and don’t use real-time code-coverage. [3]
NDepend integrates right into Visual Studio and has a super-fast analysis with a very nice dashboard overview, from which you can drill down into myriad issues and reports and analyses, from technical debt (with very daunting but probably accurate estimates for repair) to type- and assembly-interdependency problems. NDepend can also integrate code-coverage results from DotCover to show how you’re doing on that front on the dashboard as well. As with TeamCity and SonarQube, the analyses are retained as snapshots. With NDepend, you can quickly compare them (and comparing against a baseline is even included by default in the dashboard), which is essential to see if you’re making progress or regressing. [4] NDepend also integrates with TeamCity, but we haven’t set that up (yet).

With a concrete .NET Core/Standard project in the wings/under development, we’re finally ready to finish our push to make Quino Core ready for cross-platform development. For that, we’re going to need NDepend’s help, I think. Let’s take a look at where we stand today.

The first step is to choose what you want to cover. In the past, I’ve selected specific assemblies that corresponded to the “Core”. I usually do the same when building code-coverage results, because the UI assemblies tend to skew the results heavily. As noted in a footnote below, we’re starting an effort to separate Quino into high-level components (roughly, a core with satellites like Winform, WPF and Web). Once we’ve done that, the health of the core itself should be more apparent (I hope).

For starters, though, I’ve thrown all assemblies in for both NDepend analysis as well as code coverage. Let’s see how things stand overall.

The amount of information can be quite daunting but the latest incarnation of the dashboard is quite easy to read. All data is presented with a current number and a delta from the analysis against which you’re comparing. Since I haven’t run an analysis in a while, there’s no previous data against which to compare, but that’s OK.

Lines of Code
Code Elements (Types, Methods, etc.)
Comments (documentation)
Technical Debt
Code Coverage [5]
Quality Gates / Rules / Issues

Let’s start with the positive.

The Quino sources contain almost 50% documentation. That’s not unexpected. The XML documentation from which we generate our developer documentation [6] is usually as long as or longer than the method itself.
We have a solid B rating for technical debt, which is really not bad, all things considered. I take that to mean that, even without looking, we instinctively produce code with a reasonable level of quality.

Now to the cool part: you can click anything in the NDepend dashboard to see a full list of all of the data in the panel.

Click the “B” on technical debt and you’ll see an itemized and further-drillable list of the grades for all code elements. From there, you can see what led to the grade. By clicking the “Explore Debt” button, you get a drop-down list of pre-selected reports like “Types Hot Spots”.

Click lines of code and you get a breakdown of which projects/files/types/methods have the most lines of code

Click failed quality gates to see where you’ve got the most major problems (Quino currently has 3 categories)

Click “Critical” or “Violated” rules to see architectural rules that you’re violating. As with everything in NDepend, you can pick and choose which rules should apply. I use the default set of rules in Quino.

Most of our critical issues are for mutually-dependent namespaces. This is most likely not root namespaces crossing each other (though we’d like to get rid of those ASAP) but sub-namespaces that refer back to the root and vice-versa. This isn’t necessarily a no-go, but it’s definitely something to watch out for.

There are so many interesting things in these reports:

Don’t create threads explicitly (this is something we’ve been trying to reduce; I already knew about the one remaining, but it’s great to see it in a report as a tracked metric)
Methods with too many parameters (you can adjust the threshold, of course)
Types too big: we’d have to check these because some of them are probably generated code, in which case we’d remove them from analysis.
Abstract constructors should be protected: ReSharper also indicates this one, but we have it as a suggestion, not a warning, so it doesn’t get regularly cleaned up. It’s not critical, but a code-style thing. I find the NDepend report much easier to browse than the inspection report in TeamCity.

Click the “Low” issues (Quino has over 46,000!) and you can see that NDepend analyzes your code at an incredibly low level of granularity

There are almost 10,000 cases where methods could have a lower visibility. This is good to know, but definitely low-priority.
Namespace does not correspond to file location: I’m surprised to see 4,400 violations because I thought that ReSharper managed that for us quite well. This one bears investigating – maybe NDepend found something ReSharper didn’t or maybe I need to tweak NDepend’s settings.

Finally, there’s absolutely everything, which includes boxing/unboxing issues [7], method-names too long, large interfaces, large instances (could also be generated classes).

These already marked as low, so don’t worry that NDepend just rains information down on you. Stick to the critical/high violations and you’ll have real issues to deal with (i.e. code that might actually lead to bugs rather than code that leads to maintenance issues or incurs technical debt, both of which are more long-term issues).

What you’ll also notice in the screenshots that NDepend doesn’t just provide pre-baked reports: everything is based on its query language. That is, NDepend’s analysis is lightning fast (takes only a few seconds for all of Quino) during which it builds up a huge database of information about your code that it then queries in real-time. NDepends provides a ton of pre-built queries linked from all over the UI, but you can adjust any of those queries in the pane at the top to tweak the results. The syntax is Linq to Sql and there are a ton of comments in the query to help you figure out what else you can do with it.

As noted above, the amount of information can be overwhelming, but just hang in there and figure out what NDepend is trying to tell you. You can pin or hide a lot of the floating windows if it’s all just a bit too much at first.

In our case, the test assemblies have more technical debt than the code that it tests. This isn’t optimal, but it’s better than the other way around. You might be tempted to exclude test assemblies from the analysis, to boost your grade, but I think that’s a bad idea. Testing code is production code. Make it just as good as the code it tests to ensure overall quality.

I did a quick comparison between Quino 4 and Quino 5 and we’re moving in the right direction: the estimation of work required to get to grade A was already cut in half, so we’ve made good progress even without NDepend. I’m quite looking forward to using NDepend more regularly in the coming months. I’ve got my work cut out for me.

[1] Many thanks to Patrick Smacchia of NDepend for generously providing an evaluator’s license to me over the years.

[2] We came up with a plan for reducing the size of the core solution in a recent architecture meeting. More on that in a subsequent blog post.

[3] Quino has 10,000 tests, many of which are integration tests, so a change to a highly shared component would trigger thousands of tests to run, possibly for minutes. I can’t see how it would be efficient to run tests continuously as I type in Quino. I’ve used continuous testing in smaller projects and it’s really wonderful (both with ReSharper and also Wallaby for TypeScript), but it doesn’t work so well with Quino because of its size and highly generalized nature.

[4] I ran the analysis on both Quino 4 and Quino 5, but wasn’t able to directly compare results because I think I inadvertently threw them away with our nant clean command. I’d moved the ndepend out folder to the common folder and our command wiped out the previous results. I’ll work on persisting those better in the future.

[5]

I generated coverage data using DotCover, but realized only later that I should have configured it to generate NDepend-compatible coverage data (as detailed in NDepend Coverage Data. I’ll have to do that and run it again. For now, no coverage data in NDepend. This is what it looks like in DotCover, though. Not too shabby:

[6] Getting that documentation out to our developers is also a work-in-progress. Until recently, we’ve been stymied by the lack of a good tool and ugly templates. But recently we added DocFX support to Quino and the generated documentation is gorgeous. There’ll be a post hopefully soon announcing the public availability of Quino documentation.

[7] There’s probably a lot of low-hanging fruit of inadvertent allocations here. On the other hand, if they’re not code hot paths, then they’re mostly harmless. It’s more a matter of coding consistently. There’s also an extension for ReSharper (the “Heap Allocations Viewer”) that indicates allocations directly in the IDE, in real-time. I have it installed, and it’s nice to see where I’m incurring allocations.

File-system consistency

2018-03-31T23:28:27+02:00

Published by marco on 31. Mar 2018 23:28:27 (GMT-5)

The long and technical article Files are hard by Dan Luu discusses several low-level and scholarly analyses of how common file-systems and user-space applications deal with read/write errors.

How theoretically consistent is the file system?
How well-documented are patterns that guarantee consistency?
How well-understand are these patterns in the communities using them?
How do common applications (e.g. source control, databases, etc.) use these patterns?
Are these applications guaranteeing consistency?
What about the file-system designs? Are those airtight?
Are the file-system implementations correct?
How do the various components deal with hardware degradation or failure?

Asynchronous programming is hard

File-system operations work with devices and are thus asynchronous by nature. The analyses discovered similar ordering issues as with multi-threaded code.

“The most common class of error was incorrectly assuming ordering between syscalls. The next most common class of error was assuming that syscalls were atomic2. These are fundamentally the same issues people run into when doing multithreaded programming. Correctly reasoning about re-ordering behavior and inserting barriers correctly is hard. But even though shared memory concurrency is considered a hard problem that requires great care, writing to files isn’t treated the same way, even though it’s actually harder in a number of ways.”

This is why most applications should use a framework or runtime support to access the file system. Even this might not be enough, though, if the implementation is still not robust enough for the application requirements. The .NET runtime has for quite a while now offered an API that uses async/await (i.e. a promise/future-based API), which at the very least indicates the asynchronous nature of these calls, with separate paths for success and error. This is better than nothing, even if the implementation occasionally fails to properly propagate errors (as we see with the POSIX APIs below).

At any rate, the article drives home the point that programming against file systems is hard.

“People almost always just run some tests to see if things work, rather than making sure they’re coding against what’s legal in a POSIX filesystem.”

Having a few tests is better than nothing, but it’s even better to hoist your code up as many levels of abstraction as possible and avoid having to know about how to interleave fsync calls at all. Unless you’re writing a database or a source-control system, right?

A common problem: documentation

He goes on to discuss “how much misinformation is out there” and that “it’s hard for outsiders to troll through a decade and a half of mailing list postings to figure out which ones are still valid and which ones have been obsoleted”

This is a common problem that applies not just to low-level systems programming, but to any other programming problem. We have a surfeit of choice: just search online and you’ll find something that matches what you searched.

Is the source authoritative?
Is the source even competent?
Is the source relevant? Or just kind of related?
Is the source current? Or outdated?
Are you in an echo chamber that feels authoritative but is just a bunch of low-skill developers at a local maximum when the real answer to your problem is elsewhere and is actually much more elegant?

I recently ran into this phenomenon when learning Docker. Docker has changed and improved so much that the Internet is literally littered with old and overly complicated solutions to problems that either no longer exist or that can be solved with a simple one-liner in a configuration file. If you follow the instructions you find online, it’s possible that you’ll have something that works the way you want it to, but it’s also very likely that you’ll end up with a Frankenstein’s Monster of a setup that kind of works but is fragile in unnecessary ways.

Drives are not infallible

From the article:

“So far, we’ve assumed that the disk works properly, or at least that the filesystem is able to detect when the disk has an error via SMART or some other kind of monitoring. I’d always figured that was the case until I started looking into it, but that assumption turns out to be completely wrong.”

That sounds bad, of course. It’s not something we user-space programmers ever really think about, is it? You read from a file, you write to a file, it works, right? And if it doesn’t work (super-rare, right?), then the runtime throws an exception.

If we assume that the runtime throws an exception, we’re also assuming that the runtime is notified when an error occurs during a read or write operation. This was, apparently, not the case (at least in 2005-2008; we’ll see improvements below).

“In one presentation, one of the authors remarked that the ext3 code had lots of comments like “I really hope a write error doesn’t happen here” in places where errors weren’t handled. […] NTFS is somewhere in between. The authors found that it has many consistency checks built in, and is pretty good about propagating errors to the user. However, like ext3, it ignores write failures.”

Ignoring write failures! That’s kind of incredible, but if you’ve ever relied heavily on NTFS, you know that there are bugs in it. Sometimes files are just mysteriously locked and inaccessible until the system is rebooted. Why does the problem go away on reboot? NTFS is journaled and can recover its data, but it needs to be unmounted and checked. Instead of panicking, the write error is ignored. [1]

“At this point, we know that it’s quite hard to write files in a way that ensures their robustness even when the underlying filesystem is correct, the underlying filesystem will have bugs, and that attempting to repair corruption to the filesystem may damage it further or destroy it.”

Replicating the results

The papers referenced in the first article are quite old (a decade or more) but the conclusions are still fascinating. Luu discusses the need for replicating the study and laments that “replications usually give little to no academic credit. This is one of the many cases where the incentives align very poorly with producing real world impact.”

Happily, Luu followed up with another post, called File-system error-handling that reproduces some of the original results with the 2017 versions of the file systems. This is an interesting study in its own right, discussing in detail interesting nuggets like the fact that “apfs doesn’t checksum data because “[apfs] engineers contend that Apple devices basically don’t return bogus data”.” (from APFS in Detail: Data Integrity).

The second article concludes that “Filesystem error handling seems to have improved.” Basic write errors are now propagated to user-space wherever possible (i.e. if the drive is not dead). However, “[m]ost filesystems don’t have checksums for data and leave error detection and correction up to userspace software.” This is probably something that most user-space software developers never think about, but it’s crucially important. Does your software assume that the file system will always throw an error? Or does it “just assume[…] that filesystems and disks don’t have errors”?

Abstract it away!

The first article concludes with a citation from Butler Lampson:

“Lampson suggests that the best known general purpose solution is to package up all of your parallelism into as small a box as possible and then have a wizard write the code in the box.”

This is generally a good approach for anything complicated: programmers should use as high-level an API as possible for a given task. Problems like security, memory-allocation, file-system access, networking, asynchronous/parallel programming…these all fall into that category. Generally, the advice is, as usual, to get your requirements, make components that satisfy those requirements and include automated tests that verify that the components will continue to satisfy the requirements.

As Lampson says, don’t write code that’s beyond you—get a “wizard” to write it instead. That’s what most of us do when we use the runtime provided with our language. [2]

The best you can usually do is to abstract away access to external systems (including the file system) so that you can improve behavior later, should it be required. The budget and reliability constraints of a project don’t always allow you to program perfectly safely. What you can do is to make sure that the system can be made safer later with a reasonable amount of effort. To be clear: don’t be unnecessarily sloppy, but don’t tank your project guaranteeing NASA-level safety where its not needed.

So what does that mean? If you’re programming on .NET, it means you should probably stay away from some constructs that you’ve previously considered safe and not worth wrapping, like File or Directory. Instead of using these directly, use them from an injected service. This level of abstraction is not difficult to enforce if introduced early in a project and will allow for improved testing anyway. If the filesystem is abstracted, components will no longer need their tests to actually write out files in order to work.

As discussed above, this isn’t to say that you jeopardize your deadline to abstract away every single file-system reference. For some applications, file-system access is so intrinsic as to be un-mockable (e.g. databases, source-control, etc.). However, your application is probably not one of those. It’s likely that your application reads/writes files in a highly localizable manner that could be wrapped in a simple component.

This advice is similar to the by-now common practice of not using the global DateTime.UtcNow. How can this be a problem? Well, if code uses an IClock component instead, then tests can adjust “now” to be a point in the past or future and test scheduling components more easily. It’s an easy pattern to follow in new code that pays for itself the first time you need to reproduce a timing problem.

Avoiding Bugs

At the end of the second article, there’s an interesting discussion of how to avoid these kind of bugs—or just bugs, in general.

“There’s a very old debate over how to prevent things like this from accidentally happening.”

Better “tools or processes”? Be “better programmers”? Are tools like guardrails? Does it make sense to keep driving, bashing back and forth across the road, but happy that the guardrails are keeping us on the road at all? Would you do that in a car?

Well, no.

But, yes, if that’s the best option? What’s the other option? Just stop the car and don’t go anywhere anymore? Or get out and walk?

That analogy has been beaten to death—and I don’t think it’s very appropriate (as you can see from my discussion about abstraction above). Tools and processes are better than nothing. Proper programming practices and patterns are, as well. If you train yourself to use tried-and-true patterns, then you automatically avoid common errors.

Use a language with static type-checking [3]
Abstract away interfaces to the system
Use non-nullable references wherever possible
Use immutable data wherever possible [4]
Segregate mutable data into dumb objects

The point isn’t to be able to say that “there are no bugs”; it’s be able to say that “these tested bugs won’t happen”. The point is to use practices that avoid whole classes of problems.

What are better tools?

“Even better than a static analysis tool would be a language that makes it harder to accidentally forget about checking for an error.”

And now we come to the justification for some of the newer languages out there. Rust is such a language, which attempts to fix many of the shortcomings of C and C++ in the domain of allocating, sharing, modifying and freeing memory.

For error-handling, the article The Error Model by Joe Duffy discusses a very interesting and promising approach taken by a Microsoft Research team with Midori, a 100%-managed version of Windows. The basic insight is to separate bugs from recoverable errors and unrecoverable errors.

A bug is something the user-space application did wrong (e.g. passing a null reference to a method that expects only non-null references). A recoverable error is a validation error encountered when processing user input. An unrecoverable error is a file-read error in a base configuration file or a stack overflow or an out-of-memory error.

For almost all software, file-system errors are something that should just be considered an unrecoverable error. There is no reason why most applications should attempt to continue when e.g. the main configuration cannot be loaded. Most applications don’t even need to be able to recover from that. The problem occurs so rarely that you should just get a file out of backup.

Lower-level applications like Git or PostgreSql have to take more care to deal with file-system errors [5], but your software most likely doesn’t need to handle them. As discussed above, be aware that they can happen, abstract your code from the file-system so you can test error situations and improve handling where needed, but fail fast unless your project has a requirement to be able to recover in error conditions.

Generally, no-one expects a user-space application to include robust file-recovery. It’s expected, though, that the application detects when something is wrong and reports it, failing fast rather than just limping along and corrupting data.

[1] Anecdotally, it’s definitely possible to get file corruption: I’ve had critical configuration files filled with NULL bytes after certain catastrophic operations.

[2] Or why we use libraries for tough tasks (e.g. the immutable data-structures libraries for .NET, which provide a performant and correct implementation).

[3] Not that I didn’t say explicit types. I’ve used Swift and TypeScript for production code (and played with F#), all of which have inferred types. I still prefer specifying parameter and return types because it helps me localize errors immensely. For most methods, it helps me reason about the code more easily without giving away any flexibility. Still, that’s a matter of taste (and, perhaps, familiarity); the point is that static type-checking is a good thing.

[4] The article You can’t Rust that by Armin Ronacher discusses how to use immutability in a language that provides pretty good support for it, Rust. He comes to the proper conclusion that you should “[c]onsider promoting new state instead of interior mutability”, which is, not coincidentally, the same concept that the .NET immutable-collections library uses.

[5] And if you look at the analysis in the first article, Git wasn’t particularly radiant, but PostgreSql and Sqlite both did quite well.

Adventures in .NET Standard 2.0-preview1

2017-05-14T21:38:17+02:00

Published by marco on 14. May 2017 21:38:17 (GMT-5)

Updated by marco on 15. May 2017 08:36:05 (GMT-5)

.NET Standard 2.0 is finally publicly available as a preview release. I couldn’t help myself and took a crack at converting parts of Quino to .NET Standard just to see where we stand. To keep me honest, I did all of my investigations on my MacBook Pro in MacOS.

IDEs and Tools

I installed Visual Studio for Mac, the latest JetBrains Rider EAP and .NET Standard 2.0-preview1. I already had Visual Studio Code with the C#/OmniSharp extensions installed. Everything installed easily and quickly and I was up-and-running in no time.

Armed with 3 IDEs and a powerful command line, I waded into the task.

Porting Quino to .NET Standard

Quino is an almost decade-old .NET Framework solution that has seen continuous development and improvement. It’s quite modern and well-modularized, but we still ran into considerable trouble when experimenting with .NET Core 1.1 almost a year ago. At the time, we dropped our attempts to work with .NET Core, but were encouraged when Microsoft shifted gears from the extremely low–surface-area API of .NET Core to the more inclusive though still considerably cleaned-up API of .NET Standard.

Since it’s an older solution, Quino projects use the older csproj file-format: the one where you have to whitelist the files to include. Instead of re-using these projects, I figured a good first step would be to use the dotnet command-line tool to create a new solution and projects and then copy files over. That way, I could be sure that I was really only including the code I wanted—instead of random cruft generated into the project files by previous versions of Visual Studio.

The `dotnet` Command

The dotnet command is really very nice and I was able to quickly build up a list of core projects in a new solution using the following commands:

dotnet new sln
dotnet new classlib -n {name}
dotnet add reference {../otherproject/otherproject.csproj}
dotnet add package {nuget-package-name}
dotnet clean
dotnet build

That’s all I’ve used so far, but it was enough to investigate this brave new world without needing an IDE. Spoiler alert: I like it very much. The API is so straightforward that I don’t even need to include descriptions for the commands above. (Right?)

Everything really seems to be coming together: even the documentation is clean, easy-to-navigate and has very quick and accurate search results.

Initial Results

Encodo.Core compiles (almost) without change. The only change required was to move project-description attributes that used to be in the AssemblyInfo.cs file to the project file instead (where they admittedly make much more sense). If you don’t do this, the compiler complains about “[CS0579] Duplicate ‘System.Reflection.AssemblyCompanyAttribute’ attribute” and so on.
Encodo.Expressions references Windows.System.Media for Color and the Colors constants. I changed those references to System.Drawing and Color, respectively—something I knew I would have to do.
Encodo.Connections references the .NET-Framework–only WindowsIdentity. I will have to move these references to a Encodo.Core.Windows project and move creation of the CurrentCredentials, AnonymousCredentials and UserCredentials to a factory in the IOC.
Quino.Meta references the .NET-Framework–only WeakEventManager. There are only two references and these are used to implement a CollectionChanged feature that is nearly unused. I will probably have to copy/implement the WeakEventManager for now until we can deprecate those events permanently.
Quino.Data depends on Quino.Meta.Standard, which references System.Windows.Media (again) as well as a few other things. The Quino.Meta.Standard potpourri will have to be split up.

I discovered all of these things using just VS Code and the command-line build. It was pretty easy and straightforward.

So far, porting to .NET Standard is a much more rewarding process than our previous attempt at porting to .NET Core.

The Game Plan

At this point, I had a shadow copy of a bunch of the core Quino projects with new project files as well as a handful of ad-hoc changes and commented code in the source files. While OK for investigation, this was not a viable strategy for moving forward on a port for Quino.

I want to be able to work in a branch of Quino while I further investigate the viability of:

Targeting parts of Quino to .Net Standard 2.0 while keeping other parts targeting the lowest version of .NET Framework that is compatible with .NET Standard 2.0 (4.6.1). This will, eventually, be only the Winform and WPF projects, which will never be supported under .NET Standard.
Using the new project-file format for all projects, regardless of target (which IDEs can I still use? Certainly the latest versions of Visual Studio et. al.)

To test things out, I copied the new Encodo.Core project file back to the main Quino workspace and opened the old solution in Visual Studio for Mac and JetBrains Rider.

IDE Pros and Cons

Visual Studio for Mac

Visual Studio for Mac says it’s a production release, but it stumbled right out of the gate: it failed to compile Encodo.Core even though dotnet build had compiled it without complaint from the get-go. Visual Studio for Mac claimed that OperatingSytem was not available. However, according to the documentation, Operating System is available for .NET Standard—but not in .NET Core. My theory is that Visual Studio for Mac was somehow misinterpreting my project file.

Update: After closing and re-opening the IDE, though, this problem went away and I was able to build Encodo.Core as well. Shaky, but at least it works now.

Unfortunately, working with this IDE remained difficult. It stumbled again on the second project that I changed to .NET Standard. Encodo.Core and Encodo.Expressions both have the same framework property in their project files—netstandard2.0—but, as you can see in the screenshot to the left, both are identified as .NETStandard.Library but one has version 2.0.0-preview1-25301-01 and the other has version 1.6.1. I have no idea where there second version number is coming from—it looks like this IDE is mashing up the .NET Framework version and the .NET Standard versions. Not quite ready for primetime.

Also, the application icon is mysteriously the bog-standard MacOS-app icon instead of something more…Visual Studio-y.

JetBrains Rider EAP (April 27th)

JetBrains Rider built the assembly without complaint, just as dotnet build did on the command line. Rider doesn’t didn’t stumble as hard as Visual Studio for Mac, but it also didn’t had problems building projects after the framework had changed. On top of that, it wasn’t always so easy to figure out what to do to get the framework downloaded and installed. Rider still has a bit of a way to go before I would make it my main IDE.

I also noticed that, while Rider’s project/dependencies view accurately reflects .NET Standard projects, the “project properties” dialog shows the framework version as just “2.0”. The list of version numbers makes this look like I’m targeting .NET Framework 2.0.

Addtionally, Rider’s error messages in the build console are almost always truncated. The image to the right is of the IDE trying to inform me that Encodo.Logging (which was still targeting .NET Framework 4.5) cannot reference Encodo.Core (which references NET Standard 2.0). If you copy/paste the message into an editor, you can see that’s what it says. [1]

Visual Studio Code

I don’t really know how to get Visual Studio Code to do much more than syntax-highlight my code and expose a terminal from which I can manually call dotnet build. They write about Roslyn integration where “[o]n startup the best matching projects are loaded automatically but you can also choose your projects manually”. While I saw that the solution was loaded and recognized, I never saw any error-highlighting in VS Code. The documentation does say that it’s “optimized for cross-platform .NET Core development” and my projects targeted .NET Standard so maybe that was the problem. At any rate, I didn’t put much time into VS Code yet.

Next Steps

Convert all Quino projects to use the new project-file format and target .NET Framework. Once that’s all running with the new project-file format, it will be much easier to start targeting .NET Standard with certain parts of the framework
Change the target for all projects to .NET Framework 4.6.1 to ensure compatibility with .NET Standard once I start converting projects.
Convert projects to .NET Standard wherever possible. As stated above, Encodo.Core already works and there are only minor adjustments needed to be able to compile Encodo.Expressions and Quino.Meta.
Continue with conversion until I can compile Quino.Schema, Quino.Data.PostgreSql, Encodo.Parsers.Antlr and Quino.Web. With this core, we’d be able to run the WebAPI server we’re building for a big customer on a Mac or a Linux box.
Given this proof-of-concept, a next step would be to deploy as an OWIN server to Linux on Amazon and finally see a Quino-based application running on a much leaner OS/Web-server stack than the current Windows/IIS one.

I’ll keep you posted. [2]

[1]

Encodo.Expressions.AssemblyInfo.cs(14, 12): [CS0579] Duplicate ‘System.Reflection.AssemblyCompanyAttribute’ attribute
Microsoft.NET.Sdk.Common.targets(77, 5): [null] Project ‘/Users/marco/Projects/Encodo/quino/src/libraries/Encodo.Core/Encodo.Core.csproj’ targets ‘.NETStandard,Version=v2.0’. It cannot be referenced by a project that targets ‘.NETFramework,Version=v4.5’.

[2] Update: I investigated a bit farther and I’m having trouble using NETStandard2.0 from NETFramework462 (the Mono version on Mac). I was pretty sure that’s how it’s supposed to work, but NETFramework (any version) doesn’t seem to want to play with NETStandard right now. Visual Studio for Mac tells me that Encodo.Core (NETStandard2.0) cannot be used from Encodo.Expressions (Net462), which doesn’t seem right, but I’m not going to fight with it on this machine anymore. I’m going to try it on a fully updated Windows box next—just to remove the Mono/Mac/NETCore/Visual Studio for Mac factors from the equation. Once I’ve got things running on Windows, I’ll prepare a NETStandard project-only solution that I’ll try on the Mac.

C# Handbook 7.0

2017-05-01T21:42:56+02:00

Published by marco on 1. May 2017 21:42:56 (GMT-5)

Updated by marco on 1. May 2017 22:01:15 (GMT-5)

I announced almost exactly one year ago that I was rewriting the Encodo C# Handbook. The original was published almost exactly nine years ago. There were a few more releases as well as a few unpublished chapters.

I finally finished a version that I think I can once again recommend to my employees at Encodo. The major changes are:

The entire book is now a Git Repository (GitHub). All content is now in Markdown. Pull requests are welcome.
I’ve rewritten pretty much everything. I removed a lot of redundancies, standardized formulations and used a much more economical writing style than in previous versions.
Recommendations now include all versions of C# up to 7
There is a clearer distinction between general and C#-specific recommendations
There are now four main sections: Naming, Formatting, Usage and Best Practices, which is broken into Design, Safe Programming, Error-handling, Documentation and a handful of other, smaller topics.

Here’s the introduction:

“The focus of this document is on providing a reference for writing C#. It includes naming, structural and formatting conventions as well as best practices for writing clean, safe and maintainable code. Many of the best practices and conventions apply equally well to other languages.”

Check out the whole thing (GitHub)! Or download the PDF that I included in the repository.

The weird world of type-compatibility in TypeScript

2017-03-04T20:20:22+01:00

Published by marco on 4. Mar 2017 20:20:22 (GMT-5)

I recently fixed a bug in some TypeScript code that compiled just fine—but it looked for all the world like it shouldn’t have.

tl;dr: there is no TypeScript compiler bug, but my faith in the TypeScript language’s type model is badly shaken.

A simple example

The following code compiles—and well it should.

interface IB {
  name: string;
}

interface IA {
  f(action: (p: IB) => void): IA;
}

class A implements IA {
  f = (action: (p: IB) => void): IA => {
    return this;
  }
}

Some notes on this example:

The shape of interface IB isn’t relevant to the discussion.
The intent of interface IA is to require implementors to define a method named f that takes single parameter of type İ => void and returns IA.
The implementation A above satisfies this requirement. It doesn’t do anything with parameter action but that’s OK.
The definition of A.f() is what a naive user of TypeScript would assume was the only way of satisfying the requirement from IA

Oddly compatible lambdas

However, the following implementations of IA also compile.

class A2 implements IA {
  f = (action: () => IB): IA => {
    return this;
  }
}

class A3 implements IA {
  f = (action: (p: IB) => IB): IA => {
    return this;
  }
}

class A4 implements IA {
  f = (action: () => void): IA => {
    return this;
  }
}

class A5 implements IA {
  f = (): IA => {
    return this;
  }
}

Forcing incompatibility

The only one I tried that doesn’t compile is shown below.

class A6 implements IA {
  f = (action: (p: number) => void): IA => {
    return this;
  }
}

In this case, the TypeScript compiler rightly shows the following error:

Hovering over the class name A5 shows the following tooltip:

Class ‘A5’ incorrectly implements interface ‘IA’.
  Types of property ‘f’ are incompatible.
    Type ‘(action: (p: number) => void) => IA’ is not assignable to type ‘(action: (p: IB) => void) => IA’.
      Types of parameters ‘action’ and ‘action’ are incompatible.
        Type ‘(p: IB) => void’ is not assignable to type ‘(p: number) => void’.
          Types of parameters ‘p’ and ‘p’ are incompatible.
            Type ‘number’ is not assignable to type ‘IB’.

To summarize, the following types seem to be compatible with İ => void:

() => IB
İ => IB
() => void
No parameter at all

The nitty-gritty of TypeScript’s type system

In a more strongly typed language like C#, it’s clear that none of this would fly. But this is TypeScript, which defines its typing model on compatibility with the dynamic language JavaScript.

It almost looks like the type of the lambda isn’t part of the type signature of the method, which came as a quite a surprise to me (and also to my colleague, Urs, who is much more of a TypeScript expert than I am).

But maybe we don’t know enough about the TypeScript type system. Let’s look at the Type compatibility documentation for TypeScript.

This section starts off with a “Note on Soundness”, which contains a note that suggests that what we have above is completely valid TypeScript.

“The places where TypeScript allows unsound behavior were carefully considered, and throughout this document we’ll explain where these happen and the motivating scenarios behind them.”

The section Comparing two functions starts off explaining some rather surprising things about the type-compatibility of functions: for a function to be type-compatible with another function, the types of its parameters must match the types of the target type’s parameters, but the number of parameters doesn’t have to match. So if the target type has 4 parameters and the lambda to assign has 0 parameters, that lambda is compatible.

From the manual:

let x = (a: number) => 0;
let y = (b: number, s: string) => 0;

y = x; // OK
x = y; // Error

For return types, the matching behavior is opposite. That is, a “bigger” type that satisfies the expected return type is just fine.

let x = () => ({name: "Alice"});
let y = () => ({name: "Alice", location: "Seattle"});

x = y; // OK
y = x; // Error because x() lacks a location property

Reëxamining the oddly compatible lambdas

Armed with this new knowledge, let’s see if the previously bizarre-seeming behavior is actually valid.

To recap, the TypeScript compiler says that following signatures are compatible with İ => void:

f(() => IB): IA: this is compatible because the zero parameters conform by definition and any return type is OK because void is expected.
f(İ => IB): IA: this is compatible because the single parameter conforms and any return type is OK because void is expected.
f(() => void): IA: this is compatible because because the zero parameters conform by definition and any return type is OK because void is expected.
f() => IA: this one looks plain wrong at first, but the same logic applies to the whole function f(İ => void) => IA instead of to the lambda parameter for it. The interface expects a function f with a single parameter, returning IA. By the first rule above, a function with zero parameters satisfies that requirement.
f((number) => void): IA: This does not satisfy the requirement because number is not compatible with IB.
f(number): IA: This does not satisfy the requirement because number is not compatible with İ => void.
f(): void: This does not satisfy the requirement because while zero parameters is OK, the type void is smaller than IA.

Well, it looks like there’s nothing to see here, folks. The compiler is doing exactly what it’s supposed to. Move along and get on with your day.

Unfortunately, that means that TypeScript is going to be considerably less helpful for ensuring program correctness than I’d previously thought.

In fact, the caveat about Typescript “allow[ing] unsound behavior [in] carefully considered [places]” seems a bit disingenuous because, to a programmer accustomed to something like C# or Java or Swift, this kind of type-enforcement for method compatibility cannot be relied upon to enforce much of anything.

Actual vs. Formal Arguments

When I read OOSC2 (Amazon) a long time ago [1], I remember how Bertrand Meyer made the distinction between the formal type of an argument (the type in the method signature) and the actual type of an argument (the runtime type).

The method-type–conformance rules for TypeScript make sense for actual arguments. They ensure compatibility with JavaScript. What’s not clear to me is that this same logic be applied to formal arguments that are only available in TypeScript. If I declare a specific type signature in an interface, what are the odds that I want the wishy-washy JavaScript-friendly type rules for those situations? From an architect’s point of view, it would certainly be nicer to have more strict type-checking for formal definitions.

Since we don’t have that, this very lenient type-compatibility renders type-checking for lambdas largely useless in interface declarations. The compiler won’t be able to tell you that your implementation no longer matches the interface declaration because almost anything you write will actually match.

[1] I’m a nerd, I read all 1300 pages twice.

My Impressions and Notes from VoxxedDays 2017

2017-03-04T00:06:33+01:00

Published by marco on 4. Mar 2017 00:06:33 (GMT-5)

Updated by marco on 4. Mar 2017 12:07:35 (GMT-5)

Encodo presented a short talk at Voxxed Days 2017 this year, called The truth about code reviews. Sebastian and I also attended the rest of the conference. The following is a list of notes and reactions to the talks.

Engineering You
Martin Thompson — Video

The keynote was about our place in the history of software engineering. Martin described us more as alchemists than engineers right now, a sentiment with which I can only agree. There is too little precision, too little reproducibility and too little focus on safety for use to qualify as engineers.

He gave as an example the pride with which car companies brag about the hundreds of millions of lines of code they have running in software in their cars: a claim that should send shivers down your spine. We know how this software is written and how it is tested.

Quino has fewer than 100,000 lines of code (about 85,000, at least 15% of which is obsolete) and we’ve been building that for almost 10 years. How a company whose main business is building automobiles guarantees safety and correctness of 300 million lines of code is beyond my comprehension. I would venture that they don’t.

Highly recommended talk. Very interesting. Lots of good history mixed with common-sense recommendations, like the following:

Code reviews
Iterative design
Be an engineer, not an alchemist
Automate
Test

References:

Swiss Transport in Real Time: Tribulations in the Big Data Stack
Alexandre Masselot — Video

He discussed a proof-of-concept transport-tracking application. Uses the SBB REST API for vehicle positions (using the same API as exposed for the app). Then there is the OpenData Transport API for station-board information, which provides details about delays. Everything is available as JSON with relatively straightforward data models.

Uses Kafka to handle this real-time data pipeline (kind of like Chronicle, RabbitMQ or EasyMQ, but from Apache). The pipeline includes reformatting the data into the desired format (mostly eliding unwanted data), then store them in LogStash and then to ElasticSearch, which allows easy querying of the stored data. This type of data isn’t fundamentally relational, so a document-based store is appropriate.

The transformation also involves extrapolating the data that you’re interested in from the data you obtained. For example, determining whether a train is stopped. E.g. are there x events with the same position? Is the position near a station?

It was developed in Scala with Akka actors as well as the Play framework for REST. They represented all stations and trains with actors (objects). The actors are async and can run on any number of machines.

After that comes Cassandra? Are they trying to use every possible technology? I’m losing track over here. Deployment on Docker. Also uses Zookeeper in another container for load-balancing/redundancy. OMG buzzwords.

He asks: Why not a single application on a single server? Classic Java on Tomcat? It doesn’t scale. It can only scale up, but not out. The actual solution feels like a lot of moving parts, but each part does a compartmentalized task, handing off to the next piece. It ends up being quite lightweight, using very little CPU overall.

The simple, one-use components scale natively and relatively easily (LogStash, streaming, docker). The app server using Akka can be scaled, but it’s here that you have to invest time to use the available fallback and clustering strategies.

To render the data on the map, they used React to manage the data and d3.js to render. React is fast and scalable (but as Encodo has also discovered, that’s not free either). Also, the client-side CPU usage is not insignificant, even with a lot of nodes.

He also discussed UX and UI with tests. How to visualize possibly overlapping and differently sized elements at different zoom levels.

Used Jupiter to analyze data and produce graphs.

Conclusion: offload the parts of your application that aren’t your core problem to external software and services. Things like managing data streams, transforming data, etc. Focus on your models and analyzing your data.

Functional data structures in Java
Oleg Šelajev — Video

He discussed how to build reusable structures that don’t share mutable state (non-imperative vs. functional).

Classic standard libraries define mutable data structures, like lists, arrays, etc. These are not optimal for multi-tasking and asynchronous work. Mutable data structures produce side effects.
Void is a “code smell” because the only reason to call it is to cause a side effect. Prefer pure methods.
A functional data structure has to be immutable.
A functional data structure has to be persistent. This is similar to the first property, but it allows for a new structure to be created that is a mutation of the prior version. Obviously, we want to optimize storage here, reusing as much of the prior version as possible (instead of copying).
This is how mutation works, since we know that the prior version will never change, so it can be freely referenced.
Return values from methods on functional data structures are referentially transparent. You can cache the value without worrying that it will ever change or disappear.
This allows an application to work lock-free instead of guarding access to all possibly mutating methods.
It is easier to reason about functional (pure) data structures.

Any discussion of data-structure design/implementation will naturally involve balancing performance vs. storage. The safety is baked-in, but performance is always a concern when working with immutable data structures, most especially when changing them.

Even though the average call time for a method is nearly constant (as with most mutable structures), what if you call too many expensive operations and skew the average in real-world use? Well, you can combat this by leveraging the cachability of your collections (as defined above) as a way of memoizing (a well-known performance-optimization technique which carries with it possibly higher storage costs if you can’t share the memoized instances very much.)

In some cases, you can reason about performance in the following way: if you get to a situation where you would have to do an expensive operation (e.g. the reverse implicit in balancing head/tail of a queue), you can only get to this situation by having done n cheap operations first. So it is proven that the average is still constant time.

Destructive behavior (like deque) looks different than mutable data structure. In those cases, the operation returns both the removed element as well as a reference to the queue that represents the new state of the queue.

Tuple> Dequeue();

For maps, you need a concept called Zip that lets you quickly build a representation of the structure where the element viewed at a particular point in an existing structure is different. So even when a desired mutation would require alteration of a lot of the underlying structure, this operation allows reuse of a lot more of the structure than would otherwise be possible. The node can point to different parent and child nodes, referencing the new part of the structure while embedded in as much of the prior version as possible.

“Object-oriented programming makes it easier to reason about moving parts. FUnctional programming makes it easier to minimize moving parts.”

References:

What’s new in purely functional data structures since Okasaki?

Does diversity really matter?
Sombra González and Brigitte Hulliger — Video

This talk began by posing the following questions to the audience.

Do you work with women in a technical capacity? (My answer: No. The closest I’ve come was a programmer I trained as part of a group of 7 others for a customer.)
Can you remember having been in a meeting with two women or more? (My answer: A couple of project meetings over the last couple of years, but no-one in a technical capacity. Also some con conference calls, but neither of the two female participants were in a technical capacity.)

Good questions. Good topic. Mostly well-presented, although the middle dragged a bit: Sombra envisioned a (near-)future where women are the same as men in a tech world, a meritocracy. It didn’t add very much.

As with everywhere else, the software industry has to figure out how to deal with long maternity leaves. Some countries have introduced “rainbow” leaves, which allow sharing of the time between partners, so if the partner is male, the industry has to deal with male absence as well. That will probably help increase acceptance of female leave, as it removes the distinction.

For small companies, these kinds of extended leaves are a big hurdle because we can’t so easily absorb so much missing capacity.

We haven’t improved at all in the last quarter-century: there have been proportionally fewer women in technical software positions every year since 1991. The quit rate is much higher (41%) than for men (17%). This is not primarily due to family concerns, though. It’s mostly due to women not feeling comfortable in an industry where they’re often the only female in a meeting, on a team or in a company.

Reference:

@WeShapeTech

The truth about code reviews
Sebastian Greulach — Video

This talk is a reduced version of the code-review talk that Sebastian has been doing for Encodo Systems in both English and German over the last year.

The presentation includes some statistics about the value of code reviews, a discussion of which benefits you can expect to get, which types of reviewers are likely to yield which benefits as well as Encodo’s approach and advice for integrating code reviews into your development process.

This was the most informative and amazing presentation at the entire show. All kidding aside, the room was packed and the ratings were quite good. There seemed to be a lot of interest in process.

Reference:

How Modern SQL Databases Come up with Algorithms that You Would Have Never Dreamed Of
Lukas Eder — Video

This guy was supremely entertaining. He is the undisputed master of the animated and reaction GIF in presentations. Informative, spirited and very funny.

SQL is a 4GL.
It’s a declarative language.
He shows off with a calculation of the Mandelbrot set with PostgreSql (but that’s not the presentation)
He presents an example of how to address business needs (e.g. how much money per film per day). Shows how simple joins are in SQL
Then he shows how to do it with classic Java (which sucks). Basically, he shows how good SQL is by slagging on Java
He shows something that could be on Annotatiomania
At this point, the Java code is so long that “they can see our code from space.”
Eager-loading is a code smell. You should actually be able to get the objects that you want in the form that you want. The optimal result type is the exact shape of the data that you want, not the ORM objects. [1]
“When does that ever happen? Changing requirements. Never.”
He’s talking about using SQL instead of code because you don’t care about algorithms or storage types or caches—let the database developers worry about that. They’re good at it. And they love it. And the questions that you’re answering are higher-level.
He discusses how Java streams look much, much nicer. But I think .NET Linq is even nicer … and he doesn’t mention that at all. So he shows how the more readable API is much better in Java … but it’s now exactly how SQL works.
The Java is now more readable, but it’s still lazy-loading a ton of data you don’t need. You’re doing stuff on the client that the database would do much better.
Java Streams are so much uglier than Linq. They are forced to use explicitly typed Tuples (because there is still no var) and the tuple elements are unnamed (p1, p2, etc.) C# 6 is still like this, but C# 7 introduced named items for anonymous tuples.
The general-purpose languages force us to think about these things when they are not our programming domain.
Database also is capable of caching execution plans and optimizing subsequent queries. Prepared queries are da bomb. [2]
Any algorithm that produces the correct result is acceptable. It doesn’t matter how you get there.
We don’t really know what the database is really doing with a declarative syntax. We probably can’t even guess at the optimizations that the clever database is doing. Use “Explain Query” to see the estimated plan and the actual plan (based on the actual data and current situation on the database instance).
The cardinality is a hint that indicates whether to use linear or logarithmic algorithm. This will also give a hint as the order that the database will load data (e.g. to reduce the dataset as quickly as possible before applies further joins, ordering and restrictions).
Conclusion: let the database choose the algorithm based on the dataset available and the current state of the database. He shows an example with a histogram: how a query with one filter might use an index whereas a different filter might be more efficient just scanning the whole table (because 90% of the data is required anyway). The database can take disk-access speed into account. How can the developer predict which algorithm to use? The data and deployment environment isn’t known at compile-time. So since you can’t know and you’re not the guy to decide, then you should offload that decision to the software that does know: the SQL database. What about latencies for remote data? Same thing. Let the database decide.
Unless you’re the one writing the database.
The database is really good at this. It remembers how well its estimate matched the actual cost and it uses this to improve its execution plan.
Oracle can actually change the execution plan “in flight” if it sees during execution that an assumption was grievously wrong.
Also, SQL is functional: no side effects.
“Coders want to code; they want to do everything themselves.”
Also: use production data whenever possible so you have a commensurate dataset size.
Joolambda is a product from his company. Also JOOQ. Looks like Linq, actually. But maybe it works better? Nice clean API which works with arbitrary result sets instead of fixed ORM objects. Can Quino learn something from it? I asked him after the talk where JOOQ gets its metadata and it generates it from the database schema.
Without hash joins there are a whole lot of algorithms that aren’t available (MySQL).
Put business logic in the database, but be careful because how do you test it? I talked to Martin, Vlad and Lukas after the talk about testing and we agreed that databases should be immutable (Martin forbids the UPDATE statement in his projects, where he can) and then you basically have an immutable data structure in a separate process with a really powerful and efficient query languages over the graph. [3]
Locks are terribly complicated and performance is unpredictable. He says he’s “lucky to only work with read-only databases. So much easier. So much fun.”
Summary of chat after talk (some repeat from above): Chatted with Martin, Vlad and Lukas [4] after the talk about testing the database. Martin suggested that you don’t use the update statement, only insert. Lukas responded similarly, saying that we should use SQL for read-only logical queries. Jooq has a metadata generator for analyzing your database so that you can query it. It doesn’t define objects; you can only define the Tuple that you will return. That is pretty cool. Martin also pointed out that you could enforce immutability and store your data in an immutable, queryable graph by using the database.

References:

A practical introduction to Category Theory
Daniela Sfregola — Video

Category theory is about Monads, examples of which are Option, Try and Future (promise).

The example she uses shows how to apply category-theory constructs to data-validation. The examples are in Scala, although the API that she presents looks very similar to the terminology used in Java’s Streams API. E.g. flatMap(). That’s Select() for C# developers. Similarly, Options is Nullable, although I can’t think of the type analog for Some or None.

Her validation example is well-made, going from returning an Option which is no better than a Boolean. Then she shows an Either but that doesn’t allow for having both sides wrong. This can be done with Either but it’s painful. That’s why we invented pattern-matching (now available in C# 7).

When she introduced a Validated, which is capable of returning a list of errors. “Focus on how things compose.”

The talk was quite short and didn’t introduce much new. The pattern-matching syntax in Scala is a bit wordy.

g º f patterns
Mario Fusco — Video

Since my previous talk was done early, I joined Sebastian in this one. I saw only the tail-end of it, but man are the streams() libraries still really wordy. Welcome to functional programming, Java! Still, I’m disappointed that I can’t use streams() in the Android project I’m working on because it required Java 8, which forces API level 24, which excludes a lot of devices.

Sebastian said the talk was pretty good.

What about CSS? Progressive Enhancement and CSS
Ire Aderinokun — Video

She’s from Lagos, Nigeria. Google talk something or other.
Nicest slides I’ve seen all day.
Graceful degradation is the solution for only the current best browser. It doesn’t necessarily scale to future versions.
Most designers test only one version older than the supported version. Encodo tests the versions required in the spec.
The goal isn’t to dazzle the user, but to deliver the information to the widest possible audience.
Admittedly, some sites do have “dazzle the user” as a goal

Rules:

Use sparse, semantic markup
Use plain text for the content in the markup
The basic layout should work without CSS
CSS is an enhancement
Enhanced behavior can be provided by unobtrusive JavaScript
End-user browser preferences should be respected (e.g. don’t restrict zooming the UI, since a lot of users can’t see so well)

WTF is the squirrel browser? (It turns out it’s UC Browser, popular in China.) Or the one with the strange globe? (Maybe Flock? Not sure.) Does Opera really have higher market-share than IE? Probably globally, right? Phone browser in India/China/etc.

She showed a really cool graph of how many hours you have to work to use 500MB of data. Germany: 1h, Brazil: 56h, US: 6h. Bandwidth matters. A lot. WWW != Wealthy Western Web ammirite?

Use aria rules if you know that you might run on browsers that don’t understand the new tag types (increasingly unlikely). Still, phone browsers in Africa probably have never heard of
or
.
There is no difference between an unsupported CSS property or a bad value or name in the style, selector, etc.
CSS doesn’t have built-in fallbacks

More rules:

Start with sensible HTML (same as above)
Go “Mobile-first”
Use media queries
Use flexbox (was designed as a progressive enhancement, so vertical-align is ignored when flexing is enabled.)
What about “Offline-first”? That is, making sure that your app works offline to at least some degree. Syncing data can be a pain, depending on the data, though. If you just have data to log, that’s independent of other data, it’s OK.
Use CSS Feature queries (detect support or NOT support)
Use progressive enhancement to deal with IE, which doesn’t support feature queries
A good tip is that a property with a bad value is ignored by the browser.

What about the future of the web? VR? Old devices handed down from the 1st to the 3rd world.

I asked about testing that the progressive enhancements work as programmed, but no-one has any new ideas for testing, though. Manual testing to verify that the enhancements and fallbacks work.

References:

BitsOfCo.De
Understanding Progressive Enhancement by Aaron Gustafson (AListApart)

I just hacked your app!
Marcos Placona — Video

He started off the talk as a bandit, reverse-engineering a Base64-encoded name/password. He used Charles to get MITM. It was a nice trick, and it probably works on a lot of devices and apps.

It’s very easy to make a hackable application if you don’t think about security. He uses a nice word-definition slide with pronunciation and usage to make it look all official.

Pace is a bit slow at first.
Pokemon Hack was a MITM; it wasn’t malicious: kids just didn’t feel like walking. Important to remember that if motivation is high, a hacker will try really hard.
Beer hack was a loyalty hack (Kuba Gretsky)
Encrypt all the values instead of sending plain-text
But be careful of where you put your keys
This one guy Luke Chadwick uploaded his Amazon key to GitHub by accident. Farmers who watch every damned public commit got it, spun up some EC2 instances and started mining BitCoins
Use security features where possible
Use certificate pinning with the CertificatePinner()
Do NOT trust the device
Do NOT trust the app; it can be decompiled.
What about magic strings?
You can get your keys from a server
Or you can encrypt them, but what about the encryption key for the encryption key?
Get the key from the NDK. You can store information in the NDK itself, which is more secure and less decompilable than app code.
Check that the application name hasn’t been changed.
Check that the package manager is supported/correct; otherwise, your app has been republished to a new server.
Or you can also check that the installer is Google or Amazon
Check your application signature; you can check whether the app was actually compiled by YOU
Check if the device is rooted (he used some exec() command)
Check for emulator (If the build fingerprint starts with “emulator”)
Do not allow users to switch your App into debug mode
Use ProGuard, DexGuard. ProGuard is the lite version. DexGuard supports a lot of the checks listed above. DexGuard uses non-Latin Unicode for obfuscation. :-) Unfortunately, it’s a per-user charge. That’s per user of your app.
So, SafetyNet it is! That’s more like it.
“The Internet is not a Safe Place” (shows a slide of a dirty van with “Hannah Montana Concert Shuttle” sprayed on the side)
Try to hack your own applications. Always.

[1] Just thinking out loud: Often, though, you want the object, more or less, so it’s OK, no? Can we add this kind of anonymous-loading to Quino? It would be interesting to get arbitrary result sets. Like QQL. But it’s nice to work with metadata (although we’re very close to the SQL model anyway. Can we make it easier to build queries that aggregate?)

[2] Quino doesn’t use those either. Lots of wasted time there. Can we move toward working closer with the database API? How is it that we’ve avoided aggregations for so long? We just write those queries by hand.

[3] Also thinking of a talk for next year: Schema Migration with Quino. I really like SQL and it’s the best answer to a lot of questions. But I’m a programmer and I don’t like to define my metadata in table form. I like to describe my model in an agnostic way so that I can re-use my metadata in more than just a database context. Or what about a focus on building metadata-based applications? Capturing your domain model in a way that’s not so database-centric?

[4] Nice clean API which works with arbitrary result sets instead of fixed ORM objects.

The evolving Swift string API and implementation

2017-02-06T00:10:55+01:00

Published by marco on 6. Feb 2017 00:10:55 (GMT-5)

As Microsoft did a couple of years ago, Apple’s language designers are also designing the next version of Swift in public. [1] One example of the new design is the discussion of String Processing For Swift 4 (GitHub). If you read through the relatively long document, you can at least see that they’re giving the API design a tremendous amount of thought.

API Considerations for Strings

There are so many factors to weigh when building the API, especially for a low-level construct like String.

As they state right at the beginning of the document, they are concerned with “Ergonomics, Correctness, Performance” (probably in that order).
How does the API affect storage?
Is it still possible to use a COW pattern in order to save memory for multiple copies of the same string? Other, similar languages like C# and Java have slowly moved to a more-eager copying mechanism to reduce complexity in the memory-manager for strings, especially when used in multi-tasking.
How allocation-efficient is the base string library? Does the API help the more well-worn code paths avoid allocation unless absolutely necessary?
What about slicing support? Does the API force copying when it would not be needed? Does it at least allow the decision to copy to be delayed until absolutely necessary?
How accessible are the various supported representations? (E.g. UTF8 vs. UTF16)
How compatible/performant is the optimally ergonomic API with the Objective-C interoperability? This is a common case and must be as close to allocation-free as possible and fast (because thunking between Swift code and Objective-C/Cocoa APIs is very common).
Does the API leverage patterns from other parts of the API rather than expanding the String API with a bunch of overloads? (E.g. the discussion of storage for sub-strings.)
Is immutable the default, with mutability opt-in? (This prevents unwanted copies and dangling references in the reference-counted world of Swift … although Strings are actually structs rather than classes.)
Does the API do the “right thing” by default? In the case of Swift’s string-handling, this means that the caller of the API works with Unicode graphemes, by default.
What about case-sensitive/insensitive comparisons? Accent sensitivity?
What about ordering? Collation? Localization?
Does the API scale nicely to allow increasing specificity, with good defaults?
Is there consistency within the string API?
What about consistency with similar constructs, like Array?
How does the API fit with with developer expectations? Should the String be a Collection? If so, what is the default item-type?
Why doesn’t the Character have the same or a similar API as a String? (E.g. why can’t you get the sub-structure of the grapheme cluster for a character without first casting it to a String?)

Slices/Substrings

A good example is the discussion of how to represent string slices: should there be a separate type, called Substring, analogous to the ArraySlice that already exists for an Array?

“Long-term storage of Substring instances is discouraged. A substring holds a reference to the entire storage of a larger string, not just to the portion it presents, even after the original string’s lifetime ends.

“[…]

“The downside of having two types is the inconvenience of sometimes having a Substring when you need a String, and vice-versa. It is likely this would be a significantly bigger problem than with Array and ArraySlice, as slicing of String is such a common operation. It is especially relevant to existing code that assumes String is the currency type – that is, the default string type used for everyday exchange between APIs. To ease the pain of type mismatches, Substring should be a subtype of String in the same way that Int is a subtype of Optional.”

To implement `Collection` or not?

For those that watch as the API for Swift evolves from one major version to another—with each change introducing non–backward-compatible incompatibilities—this document should hopefully reassure them that the changes are not made lightly. It may seem like the designers don’t have a plan, but, over the years, designers and opinions change. E.g. Witness the discussion of what the default representation of the string should be.

“[…] in Swift 1.0, String was a collection of Character (extended grapheme clusters). […] In Swift 2.0, String’s Collection conformance was dropped, because we convinced ourselves that its semantics differed from those of Collection too significantly.”

After listing several reasons why the change in Swift 2.0 was not a good direction, they conclude that in 4.0, they should revert to the original behavior.

“It would be much better to legitimize the conformance to Collection and simply document the oddity of any concatenation corner-cases, than to deny users the benefits on the grounds that a few cases are confusing.”

Again, the discussion is open and public and, despite the claims of some who think that they’re just a bunch of cowboys changing stuff willy-nilly, they have a documented plan.

It’s unfortunate that it took them so long to get there, but this kind of design isn’t always easy.

Consolidating Index Types

Because Swift uses Unicode grapheme clusters as the default “items” view for strings, the discussion of string indices might seem unnecessarily abstract for developers coming from other languages, where the index is always an int int bytes.

“String currently has four views–characters, unicodeScalars, utf8, and utf16 […]”

Because of these different views, it’s necessary to discuss how to reduce API surface by consolidating the various index types used to refer to individual elements in these different “views” on a String.

Doing the Right Thing

It’s not like C#—and most other mainstream languages—have anything to brag about with their string-handling. In that respect, even Swift 1 and 2 are light-years ahead in Unicode correctness with their focus on grapheme clusters rather than the utterly nonsensical 90s-era bytes still used in those other languages.

The Guidance for API Designers shows how they try to build the API so that it makes sense for callers.

“A Substring passed where String is expected will be implicitly copied. When compared to the “same type, copied storage” model, we have effectively deferred the cost of copying from the point where a substring is created until it must be converted to String for use with an API.

“A user who needs to optimize away copies altogether should use this guideline: if for performance reasons you are tempted to add a Range argument to your method as well as a String to avoid unnecessary copies, you should instead use Substring.”

Their goal is noble, though it’s unclear to what degree the vision can be realized. The following citation could be written as the high-level goal of any API.

“We should represent these aspects as orthogonal, composable components, abstracting pattern matchers into a protocol like this one, that can allow us to define logical operations once, without introducing overloads, and massively reducing API surface area.”

[1] Also as Microsoft did, Apple now has an official, public issue tracker for Swift. [2] You could have knocked me over with a feather.

[2] The downside: it’s Atlassian Jira.

A tuple-inference bug in the Swift 3.0.1 compiler

2017-02-04T18:17:03+01:00

Published by marco on 4. Feb 2017 18:17:03 (GMT-5)

Updated by marco on 5. Feb 2017 23:42:56 (GMT-5)

I encountered some curious behavior while writing a service-locator interface (_protocol_) in Swift. I’ve reproduced the issue in a stripped-down playground [1] and am almost certain I’ve found a bug in the Swift 3.0.1 compiler included in XCode 8.2.1.

Update: At the suggestion of a reader, I searched and found Apple’s Jira for Swift [2] and reported this issue as A possible tuple-inference/parameter-resolution bug in Swift 3.0.1

A Simple, Generic Function

We’ll start off with a very basic example, shown below.

Simple argument with label

The example above shows a very simple function, generic in its single parameter with a required argument label a:. As expected, the compiler determines the generic type T to be Int.

I’m not a big fan of argument labels for such simple functions, so I like to use the _ to free the caller from writing the label, as shown below.

Simple argument without label

As you can see, the result of calling the function is unchanged.

Or Maybe Not So Simple?

Let’s try calling the function with some other combinations of parameters and see what happens.

Label-less argument with tuples and multiple parameters

If you’re coming from another programming language, it might be quite surprising that the Swift compiler happily compiles every single one of these examples. Let’s take them one at a time.

int: This works as expected
odd: This is the call that I experienced in my original code. At the time, I was utterly mystified how Swift—a supposedly very strictly typed language—allowed me to call a function with a single parameter with two parameters. This example’s output makes it more obvious what’s going on here: Swift interpreted the two parameters as a Tuple. Is that correct, though? Are the parentheses allowed to serve double-duty both as part of the function-call expression and as part of the tuple expression?
tuple: With two sets of parentheses, it’s clear that the compiler interprets T as tuple (Int, Int).
labels: The issue with double-duty parentheses isn’t limited to anonymous tuples. The compiler treats what looks like two labeled function-call parameters as a tuple with two Ints labeled a: and b:.
nestedTuple: The compiler seems to be playing fast and loose with parentheses inside of a function call. The compiler sees the same type for the parameter with one, two and three sets of parentheses. [3] I would have expected the type to be ((Int, Int)) instead.
complexTuple: As with tuple, the compiler interprets the type for this call correctly.

Narrowing Down the Issue

The issue with double-duty parentheses seems to be limited to function calls without argument labels. When I changed the function definition to require a label, the compiler choked on all of the calls, as expected. To fix the problem, I added the argument label for each call and you can see the results below.

Labeled argument with tuples and multiple parameters

int: This works as expected
odd: With an argument label, instead of inferring the tuple type (Int, Int), the compiler correctly binds the label to the first parameter 1. The second parameter 2 is marked as an error.
tuple: With two sets of parentheses, it’s clear that the compiler interprets T as tuple (Int, Int).
labels: This example behaves the same as odd, with the second parameter b: 2 flagged as an error.
nestedTuple: This example works the same as tuple, with the compiler ignoring the extra set of parentheses, as it did without an argument label.
complexTuple: As with tuple, the compiler interprets the type for this call correctly.

Swift Grammar

I claimed above that I was pretty sure that we’re looking at a compiler bug here. I took a closer look at the productions for tuples and functions defined in The Swift Programming Language (Swift 3.0.1) manual available from Apple.

First, let’s look at tuples:

Grammar of a Tuple Expression

As expected, a tuple expression is created by surrounding zero or more comma-separated expressions (with optional identifiers) in parentheses. I don’t see anything about folding parentheses in the grammar, so it’s unclear why (((1))) produces the same type as (1). Using parentheses makes it a bit difficult to see what’s going on with the types, so I’m going to translate to C# notation.

() => empty tuple [4]
(1) => Tuple
((1)) => Tuple>
…and so on.

This seems to be a separate issue from the second, but opposite, problem: instead of ignoring parentheses, the compiler allows one set of parentheses to simultaneously denote the argument clause of a single-arity function call and an argument of type Tuple encompassing all parameters.

A look at the grammar of a function call shows that the parentheses are required.

Grammar of a Function Call Expression

Nowhere did I find anything in the grammar that would allow the kind of folding I observed in the compiler, as shown in the examples above. I’m honestly not sure how that would be indicated in grammar notation.

Conclusion

Given how surprising the result is, I can’t imagine this is anything but a bug. Even if it can be shown that the Swift compiler is correctly interpreting these cases, it’s confusing that the type-inference is different with and without labels.

[1]

The X-Code playground is a very decent REPL for this kind of example. Here’s the code I used, if you want to play around on your own.

func test(_ a: T) -> String
{
  return String(describing: type(of: T.self))
}

var int = test(1)
var odd = test(1, 2)
var tuple = test((1, 2))
var labels = test(a: 1, b: 2)
var nestedTuple = test(((1, 2)))
var complexTuple = test((1, (2, 3)))

[2] I was amazed to find that Apple actually has a normal bug tracker for which I could create an account. Wonders never cease.

[3] I didn’t include the examples, but the type is unchanged with four, five and six sets of parentheses. The compiler treats them as semantically irrelevant, though the Swift grammar doesn’t allow for this, as far as I could tell from the BNF in the official manual.

[4] This is apparently legal in Swift, but I can’t divine its purpose in an actual program

Programming-language Features: How much is too much?

2017-01-15T23:40:49+01:00

Published by marco on 15. Jan 2017 23:40:49 (GMT-5)

Updated by marco on 4. Oct 2023 21:24:02 (GMT-5)

The article Dark Path by Robert C. Martin was an interesting analysis of a recent “stricter” trend in programming languages, as evidenced by Swift and Kotlin. I think TypeScript is also taking some steps along this path, as well as Rust, which I have a read a lot about, but haven’t had much occasion to use.

Correct vs. Expressive

The point Martin makes is that all of these languages seem to be heedlessly improving correctness at the possible cost of expressiveness and maintainability. That is, as types are inferred from implementation, it can become more difficult to pinpoint where the intent of the programmer and the understanding of the compiler parted ways. As well, with increasing strictness—e.g. non-null references, reference-ownership, explicit exceptions, explicit overrides—there comes increasing overhead in maintaining code.

Not only that, but developers must know their types—and hence their design—up front, which restricts evolving design as practiced in the very successful TDD approach and seems to be headed back to the stone age of waterfall design. As well, that level of strictness convinces developers—who are similarly encouraged by the language designers—that once their code compiles, then it runs as expected.

But then they think they don’t need to test, whereas the compiler really has no idea whether your code does what it should do. All it can guarantee is that no exception went unhandled—or explicitly ignored—(e.g. in Kotlin or Swift) or there are no race conditions or deadlocks (Rust) or that there are no null references where not explicitly programmed (Swift, Kotlin, TypeScript).

These compiler-enforced language features are very useful, but are in the same class as the spell-checker in your text editor. Having no red, wavy lines in your document is no guarantee that the document makes any sense whatsoever.

So these are interesting and useful features. They can lead to increased safety. But, they won’t make your program do what it’s supposed to do. At best, they help you avoid writing behavior that you most definitely don’t want.

These features are nice to have, but they are not worth having at any price.

Round Two: Defaming Types

It was an interesting article that I more-or-less agreed with. The follow-up article Types and Tests by Robert C. Martin (Clean Coder Blog) followed close on its heels because Martin apparently wanted to respond to feedback he’d received on the first article. I thought he went a bit far in the second article. For example, he emphasized that,

“No, types are not tests. Type systems are not tests. Type checking is not testing. Here’s why.”

That’s absolutely true, but types are still related to testing. Types help me specify my interface more precisely and I can trust the compiler to enforce them. That’s a lot of tests I don’t have to write.

Otherwise, for every API I write, I’d have to write tests to prove that only the supported types can be passed in—and I’d also have to specify how my API behaves when value with an incorrect type is passed in. Do I fail silently? How do I let the caller know what to expect? This seems not only sloppy but time-consuming. It sounds like busy work, having to think about this kind of stuff for every API.

Precise Requirements

Martin continues,

“[…] the way f is called has nothing to do with the required behavior of the system. Rather it is a test of an arbitrary constraint imposed by the programmer. A constraint that was likely over[-]specified from the point of view of the system requirements. (Emphasis added.)”

The first sentence is a useful observation. The second is hyperbole. Indicating int rather than object for a parameter called limit hardly seems like an over-specification. In fact, in seems like exactly what I want.

If the requirement says shall allow a user to enter a value for limit… rather than shall allow a user to enter a positive number for limit…, then I would argue that 99% of the time it’s the requirement that isn’t precise enough. I would not assume that the requirements engineer knew just what she was doing when she left the door open for a limit given as a string.

Without types, our requirements would also become bloated with over-definitions like:

The system shall allow the user to enter a value that defines the upper limit of records to return.
The system shall throw an exception of type ArgumentOutOfRangeException for values that are less than zero or greater than 1000.
The system shall marshal the value to a numeric representation where necessary.
The system shall throw an exception of type ClassCastException if the given value cannot be marshaled to a numeric value.

For this specification, a developer could write:

public void SetLimit(object limit)
{
  int limitAsNumber;
  if (!Int32.ParseInt(limit, out limitAsNumber))
  {
    throw new ClassCastException("…");
  }

  if (limitAsNumber > 1000)
  {
    throw new ArgumentOutOfRangeException("limit");
  }

  _limit = limit;
}

The developer could also write:

public void SetLimit(UInt32 limit)
{
  if (limit > 1000)
  {
    throw new ArgumentOutOfRangeException("limit");
  }

  _limit = limit;
}

That’s actually what we want the developer to write, no? If you choose JavaScript to implement this requirement, then you would need to over-specify because you need to decide how to handle values with unsupported types. If the requirements engineer is allowed to assume that the implementing language has a minimal type system, then the requirements are also easier to write, as shown below.

The system shall allow the user to enter a positive integer that defines the upper limit of records to return.
The system shall throw an exception of type ArgumentOutOfRangeException for values that are less than zero or greater than 1000.
~~The system shall marshal the value to a numeric representation where necessary.~~
~~The system shall throw an exception of type ClassCastException if the given value cannot be marshaled to a numeric value.~~

Expressive Types are a Good Thing

Assuming a minimal type system in the target language saves time and effort. The requirements engineer can specify more concisely and the software engineer wastes less time writing boilerplate that has nothing to do with application behavior.

Martin finished up with this sentiment,

“So, no, type systems do not decrease the testing load. Not even the tiniest bit. But they can prevent some errors that unit tests might not see. (e.g. Double vs. Int) (Emphasis added.)”

As you can imagine, I strongly disagree with the “[n]ot even the tiniest bit” part, based on my arguments above. If you use JavaScript, then you have to test all valid input and verify its behavior. In JavaScript, literally any data is valid input and it’s up to your method to declare it invalid.

Only tests can provide any protection against your method being called at runtime with invalid data. You have to write a test to verify that your method throws an error when passed a double rather than an int. Most people will not write these kind of tests, which I suspect is why Martin says there’s no change in testing load.

Strict Languages for Bad Programmers?

I agree that the pendulum in Swift has swung too far in a restrictive direction. The language does feel pretty overloaded. I also agree that the behavior of the system itself needs to be tested and that types don’t help you there.

Martin again,

“On the other hand, internal self-consistency does not mean the program exhibits the correct behavior. Behavior and self-consistency are orthogonal concepts. Well behaved programs can be, and have been, written in languages with high ambiguity and low internal consistency. Badly behaved programs have been written in languages that are deeply self-consistent and tolerate few ambiguities. (Emphasis added.)”

Agreed.

I think, though, that Martin might be forgetting about all of the people writing software who aren’t the kind of people who can write a well-behaved program in a wildly inconsistent language. I, for example, am so awesome [1] that I wrote my entire web-site software in PHP—one of the worst languages in the world for internal self-consistency—and it’s been running my site for going on 18 years. Programming skill and iron discipline fill the gap left by language consistency.

But for bad programmers? They write utter garbage in PHP. Maybe it’s not a bad idea to create languages that channel poorly disciplined programmers into better practices. I take the point from the previous article (Dark Path) that bad programmers will simply work their way around the rigor, where possible. They will mark every class as open in Swift instead of thinking about their architecture.

For those of us with discipline, the language will put up roadblocks that force us to write more code rather than less.

Unfixable Errors?

As a counterexample, there is Rust, which enforces reference-ownership in a way that guarantees concurrent code with no deadlocks and no race conditions. This is a good thing. It probably gets in your way when you’re trying to write other types of programs, but it’s overall a good thing.

I haven’t had any personal experience with it, but I’ve heard that it’s sometimes difficult to figure out why a given program won’t compile. I would hope that these situations become fewer with experience, but would also be cautious because I remember programming in C++ with templates and know how much time can be lost when you don’t know how to fix your program based on an error message.

Non-nullable types are not a step too far

I, for one, like that my compiler tells me when I have potential null-reference exceptions. I use attributes in C# to tell me exactly that and I use R# to find all places in my code where I have potential violations. Those are more tests that I don’t have to write, if the compiler can “prove” that this code is never called with a null reference. [2] It lets me write more concise implementation and spares me a lot of scaffolding.

What about const/non-mutable types?

Many years ago, I had the same experience with const in C++ as Martin discusses. After some time working with const, I starting making everything I possibly could const in order to eliminate a whole class of mutation errors in my code. That did have consequences, at the time. Changing one thing could—as Martin describes for his hypothetical language TDP—lead to knock-on changes throughout the code base.

Generics can have this effect, as well, with changes leaking into all of the places they’re used. I wrote a blog series on having pulled back from generics in a few central places in Quino.

Exceptions in method signatures

I often felt the way that Martin does about Java’s throws declaration. I imagine that I’ll start to feel the same about Swift’s, as well. I read once about a nice typing system in Midori, the managed version of Windows created by Joe Duffy and team at Microsoft Research, that I felt I would like to try (no pun intended).

Toward more expressive types

Martin says that he uses both dynamically and statically typed languages. He acknowledges that certain extensions to the type system can be useful (but just that some languages have gone too far).

I, too, think some innovations can be very helpful. I like immutables (types, declarations, whatever) because they let me reason better about my code. They let me eliminate unwanted code paths with the compiler rather than having to write more rote tests that I think even Martin will agree have nothing to do with the original specification or the behavior of my application.

If I can mark something as readonly because I don’t expect it to ever need to be changed, that’s a little note I’ve left for future programmers that, should they want to modify that value, they will have to make sure to reason differently about the implementation. The value was never intended to be rewritten and there are no tests for that behavior. It’s a nice way of reducing the scope of the implementation.

It simultaneously restricts that scope, but that’s a good thing. A program can, very quickly, do a lot of things that it should not do. I don’t want to write tests for all of this stuff. I have neither the inclination nor the time—nor the budget—to write tests for things that I could instead eliminate entirely from the realm of possibility with a powerful type system.

Agreeing with Martin

I read up on Kotlin and saw a seminar on it last year. I, too, noticed that there seems to be an “everything but the kitchen sink” feel to it. It’s the same feeling I get when I look at Scala’s type system, though that one is less about restriction than about letting you do everything in 3 different ways.

I’ve been reading through the Swift language guide and I’m getting the same feeling. It doesn’t help that they have their own name and keyword for nearly every commonly known programming concept. You can use self. but the guide prefers just ., which takes some getting used to. finally? Nope. Use defer instead.

An example from TypeScript

To be honest, I’m also a bit dizzy at how quickly the TypeScript type system has gotten more and more complex. TypeScript 2.1: keyof and Lookup Types by Marius Schulz includes details on even more typing specifications that let you infer types from dynamic objects with flow-control analysis.

I think this is quite an interesting approach, akin to more functional languages, like ML and F#, where return types are inferred and even parameter types are inferred. Swift has also gone a long way in this direction. Interfaces are replaced with non-inheritable types that describe the shape of data.

Types can even be inferred by which fields you access within conditionals so that a single variable has a different inferred type depending on which path through the code it takes. It’s all very exciting, but I wonder how much can be used correctly—especially by the aforementioned crappy programmers.

For example, this is the definition for the Object.entries() method from JavaScript.

interface ObjectConstructor {
    // …
    entries(o: T): [keyof T, T[K]][];
    // …
}

Inferred method return-types

After having used languages that have explicit return types for methods, I’m still a bit at sea when I read TypeScript code without them. I find myself hovering identifiers to see which type was inferred for them by the real-time compilation.

I agree that the code is cleaner, but maybe something’s gone missing. It’s harder to tell what the hell I’m supposed to pass in as a parameter or what the hell I get back from a function when the type can be a union of 3 or 4 other vaguely and sometimes ad-hoc–defined types.

For example, a lot of code just constantly redefines the hash-table interface rather than just defining a type for it … so the caller isn’t restricted to implementing a specific interface. This is nice for library code, I guess, but it makes it harder to reason about the code because you don’t have good names for types. This is an interesting enough experience for seasoned programmers; I can’t even imagine how average or bad programmers deal with it.

I see where Martin is coming from, that he’s afraid of BDUF, something he’s been fighting for years by arguing that you can design as you go if you’ll just test your code as you write it. If you see that a parameter has to be an IHashMap, that’s easier to understand than { [key: string]: any } or { [key: string]: T } where T is a completely different type. There are advantages and disadvantages.

“Every step down that path increases the difficulty of using and maintaining the language. Every step down that path forces users of the language to get their type models “right” up front; because changing them later is too expensive. Every step down that path forces us back into the regime of Big Design Up Front.”

I agree with the sentiment, but I don’t know if we’re there yet. Martin argues that there is a balance and maybe I need more experience with the languages he’s horrified about. He does write:

“I think Java and C# have done a reasonable job at hovering near the balance point. (If you ignore the horrible syntax for generics, and the ridiculous proscription against multiple inheritance.)”

…which I agree with wholeheartedly. I have learned to live without multiple inheritance, but I regularly railed against its absence for decades. I have given up because the world has moved on. I would love to see proper contravariance and covariant return types and anchored types, but I’ve kind of given up on seeing that kind of stuff in a mainstream language, as well. Instead, I’ve drifted more toward immutable, stateless, functional style—even in C#. I’m ogling F#. I’m working with Swift now and will do much more of that this year.

[1] Did you expect me to soften that brag down here, in this footnote?

[2] I also use R#’s solution-wide analysis.

Set up JetBrains DataGrip with local SQL Server Express

2017-01-11T08:47:45+01:00

Published by marco on 11. Jan 2017 08:47:45 (GMT-5)

The article Connecting DataGrip to MS SQL Server by Maksim Sobolevskiy on June 21, 2016 (JetBrains Blog) covers all of the points well, with screen shots but I just wanted to record my steps, collected into a tight list. Screenshots for most of these steps are available in the blog linked above.

If you don’t have a license for DataGrip, you can download a 30-day trial or you can download the JetBrains Rider EAP, which bundles it. Once Rider is released, you’ll have to have a license for it, but—for now—you can use it for free.

Open SQL Server Configuration Manager to make sure that the TCP/IP protocol for SQL Server Express in the SQL Server Network Configuration is enabled. Restart the SQL Server Windows service if you made changes.
Make sure that the SQL Server Browser Windows service is running.
Open the Database tool window.
Add a SQL Server (Microsoft) connection. Make sure the driver is downloaded. Fill out the settings as shown for QuinoTests below (click to expand). Note that the port has been cleared.

SqlServer Express Settings for DataGrip

A Surfeit of C#/.NET IDEs

2016-11-24T20:02:47+01:00

Published by marco on 24. Nov 2016 20:02:47 (GMT-5)

For many years, the C#/.NET world has been dominated by a single main IDE: Visual Studio. MonoDevelop has also been available for a while, as an alternative for users on other platforms. Lately, though, there have been a few new contenders in the .NET IDE arena.

Visual Studio for Mac

We’ll get this one out of the way first: this is basically Xamarin Studio for Mac, rebranded as Visual Studio for Mac. This IDE is pretty and extremely well-integrated into MacOS, with a lot of animated editor interaction for compiler warnings and errors.

Unlike Rider or Visual Studio 2017 with ReSharper, Xamarin Studio doesn’t benefit from the R# tooling, so there are a few things immediately missing. Navigation is not as smooth as with ReSharper-based IDEs [1], although it’s definitely on-par with what I’ve experienced in Xcode. Xamarin Studio is fast and pretty good and I’ll definitely keep it in the mix for testing Quino on alternate platforms once we start the move to .NET Standard 2.0. [2]

Project Rider

This is only an EAP, so keep that in mind when testing. I installed this IDE on my Mac and Windows. The setup process was very smooth, asking for theme/color preferences and—most importantly—keyboard preferences. This time, the key-mapping for “Visual Studio” turned out to be quite appropriate and good.

I was able to load the Quino solution relatively quickly. The first load kicks off two processes: Nuget Restore and Process Files. On subsequent loads, the Nuget Restore no longer applies and Process Files benefits from Rider having cached everything the first time around.

I couldn’t find any option to add an extra NuGet source, which was odd. There is a tab in the “Nuget Packages” pane called “sources”, but it just lists the NuGet configuration files but doesn’t offer any way to add sources.

On the plus side, the test runner worked immediately. but on the minus side, it delivered results inconsistent with VS2015 and VS2017 running on the same machine. It looks and behaves like the same test runner as in ReSharper [3], but the results are different for some (a few hundred) Quino tests.

It loads quickly, can deal with the Quino solution without issues and the test runner works. Everything else felt like Visual Studio with ReSharper—at least for the stuff I use. I’ll keep an eye on this IDE.

Visual Studio 2017 RC1

I installed this with ReSharper 2016.3EAP9 and was pleasantly surprised to see that it behaved like an actual RC. That is, instead of releasing Alpha/early-beta software as an RC—I’m looking at you, .NET Core—they’ve got a really solid release on their hands.

That said, it’s not quite ready for production use (obvious from the RC moniker) but I was able to use it for productive use over a long weekend. So I was pretty encouraged that I’ll be able to let the guys at Encodo use it sooner rather than later. [4]

That said, here are the things I’ve noticed that are missing:

They claim that EditorConfig is included, but it didn’t work for me. I found an EditorConfig extension that I installed, but I’d already set the configuration manually by then, so I’m not sure what effect it had.
I’d turned CodeLens off, but I still saw a rogue Git process. I haven’t seen this process before or since, so I can only assume it came from 2017.
The StyleCop by JetBrains extension is not available yet, so that’s a lot of hints and warnings missing. The maintainers of this extension write that they will release as soon as ReSharper does.
The standard Visual Studio TestRunner is still unusable. I can’t imagine working without the ReSharper TestRunner. [5]

Everything else seemed to work fine, which speaks well of both VS2017 and R#’s latest EAP.

Conclusion

Xamarin Studio/Visual Studio for Mac: this won’t be in the mix yet. It doesn’t offer anything more than Rider on Mac. Plus, the Quino solution isn’t ready for Mac development yet. Once .NET Standard 2.0 is ready and we can target Quino to that, we’ll take another look.
Project Rider: I’ll be keeping an eye on future releases of JetBrains’s Project Rider, but won’t use the current EAP version again (tests don’t work reliably).
Visual Studio 2017: Once this is released, I feel good about installing it and releasing to other Encodo developers much sooner than with other versions. Especially since ReSharper’s support is already so solid and once the StyleCop extension is available. Fingers crossed that the quality stays the same.

[1] I.e. go to Base/Inheriting implementations, and so on.

[2] Although .NET Core 1.1 has been released, it seems that .NET Standard 2.0 will not be available until 2017. This isn’t exactly unexpected, but we aren’t going to move Quino onto a new platform target until this is released. It’s possible that we could target .NET Core as well, but we burned our fingers once, so we’re going to hold off until .NET Standard.

[3] E.g. I’ve been having sporadic test failures when the runner is unable to load the latest FakeItEasy assembly in Visual Studio on my desktop—but never my laptop or our TeamCity build server. A bunch of test failures in Rider were due to the same reason.

[4] We usually wait until the first service pack—at least.

[5] Which has been—thankfully—stable since 2016.2.

ABD: Improving the Aspect-modeling API for Quino

2016-06-05T12:52:31+02:00

Published by marco on 5. Jun 2016 12:52:31 (GMT-5)

Overview

We discussed ABD in a recent article ABD: Refactoring and refining an API. To cite from that article,

“[…] the most important part of code is to think about how you’re writing it and what you’re building. You shouldn’t write a single line without thinking of the myriad ways in which it must fit into existing code and the established patterns and practices.”

With that in mind, I saw another teaching opportunity this week and wrote up my experience designing an improvement to an existing API.

Requirements

Before we write any code, we should know what we’re doing. [1]

We use aspects (IMetaAspects) in Quino to add domain-specific metadata (e.g. the IVisibleAspect controls element visibility)
Suppose we have such an aspect with properties A1…AN. When we set property A1 to a new value, we want to retain the values of properties A2…AN (i.e. we don’t want to discard previously set values)
The current pattern is to call FindOrAddAspect(). This method does what it advertises: If an aspect with the requested type already exists, it is returned; otherwise, an instance of that type is created, added and returned. The caller gets an instance of the requested type (e.g. IVisibleAspect).
Any properties on the requested type that you want to change must have setters.
If the requested type is an interface, then we end up defining our interface as mutable.
Other than when building the metadata, every other use of these interfaces should not make changes.
We would like to be able to define the interface as read-only (no setters) and make the implementation mutable (has setters). Code that builds the metadata uses both the interface and the implementation type.

Although we’re dealing concretely with aspects in Quino metadata, the pattern and techniques outlined below apply equally well to other, similar domains.

The current API

A good example is the IClassCacheAspect. It exposes five properties, four of which are read-only. You can modify the property (OrderOfMagnitude) through the interface. This is already not good, as we are forced to work with the implementation type in order to change any property other than OrderOfMagnitude.

The current way to address this issue would be to make all of the properties settable on the interface. Then we could use the FindOrAddAspect() method with the IClassCacheAspect. For example,

var cacheAspect = 
  Element.Classes.Person.FindOrAddAspect(
    () => new ClassCacheAspect()
  );
cacheAspect.OrderOfMagnitude = 7;
cacheAspect.Capacity = 1000;

For comparison, if the caller were simply creating the aspect instead of getting a possibly-already-existing version, then it would just use an object initializer.

var cacheAspect = Element.Classes.Person.Aspects.Add(
  new ClassCacheAspect()
  {
    OrderOfMagnitude = 7;
    Capacity = 1000;
  }
}

This works nicely for creating the initial aspect. But it causes an error if an aspect of that type had already been added. Can we design a single method with all the advantages?

The new API

A good way to approach a new is to ask: How would we want the method to look if we were calling it?

Element.Classes.Person.SetCacheAspectValues(
  a =>
  {
    a.OrderOfMagnitude = 7;
    a.Capacity = 1000;
  }
);

If we only want to change a single property, we can use a one-liner:

Element.Classes.Person.SetCacheAspectValues(a => a.Capacity = 1000);

Nice. That’s even cleaner and has fewer explicit dependencies than creating the aspect ourselves.

Making it work for one aspect type

Now that we know what we want the API to look like, let’s see if it’s possible to provide it. We request an interface from the list of aspects but want to use an implementation to set properties. The caller has to indicate how to create the instance if it doesn’t already exist, but what if it does exist? We can’t just upcast it because there is no guarantee that the existing aspect is the same implementation.

These are relatively lightweight objects and the requirement above is that the property values on the existing aspect are set on the returned aspect, not that the existing aspect is preserved.

What if we just provided a mechanism for copying properties from an existing aspect onto the new version?

var cacheAspect = new ClassCacheAspect();
var existingCacheAspect =
  Element.Classes.Person.Aspects.FirstOfTypeOrDefault();
if (existingCacheAspect != null)
{
  result.OrderOfMagnitude = existingAspect.OrderOfMagnitude;
  result.Capacity = existingAspect.Capacity;
  // Set all other properties
}

// Set custom values
cacheAspect.OrderOfMagnitude = 7;
cacheAspect.Capacity = 1000;

This code does exactly what we want and doesn’t require any setters on the interface properties. Let’s pack this away into the API we defined above. The extension method is:

public static ClassCacheAspect SetCacheAspectValues(
  this IMetaClass metaClass,
  Action setValues)
{
  var result = new ClassCacheAspect();
  var existingCacheAspect =
    metaClass.Aspects.FirstOfTypeOrDefault();
  if (existingCacheAspect != null)
  {
    result.OrderOfMagnitude = existingAspect.OrderOfMagnitude;
    result.Capacity = existingAspect.Capacity;
    // Set all other properties
  }

  setValues(result);

  return result;
}

So that takes care of the boilerplate for the IClassCacheAspect. It hard-codes the implementation to ClassCacheAspect, but let’s see how big a restriction that is once we’ve generalized below.

Generalize the aspect type

We want to see if we can do anything about generalizing SetCacheAspectValues() to work for other aspects.

Let’s first extract the main body of logic and generalize the aspects.

public static TConcrete SetAspectValues(
  this IMetaClass metaClass,
  Action copyValues,
  Action setValues
)
  where TConcrete : new, TService
  where TService : IMetaAspect
{
  var result = new TConcrete();
  var existingAspect = metaClass.Aspects.FirstOfTypeOrDefault();
  if (existingAspect != null)
  {
    copyValues(result, existingAspect);
  }

  setValues(result);

  return result;
}

Remove constructor restriction

This isn’t bad, but we’ve required that the TConcrete parameter implement a default constructor. Instead, we could require an additional parameter for creating the new aspect.

public static TConcrete SetAspectValues(
  this IMetaClass metaClass,
  Func createAspect,
  Action copyValues,
  Action setValues
)
  where TConcrete : TService
  where TService : IMetaAspect
{
  var result = createAspect();
  var existingAspect = metaClass.Aspects.FirstOfTypeOrDefault();
  if (existingAspect != null)
  {
    copyValues(result, existingAspect);
  }

  setValues(result);

  return result;
}

Just pass in the new aspect to use

Wait, wait, wait. We not only don’t need to the new generic constraint, we also don’t need the createAspect lambda parameter, do we? Can’t we just pass in the object instead of passing in a lambda to create the object and then calling it immediately?

public static TConcrete SetAspectValues(
  this IMetaClass metaClass,
  TConcrete aspect,
  Action copyValues,
  Action setValues
)
  where TConcrete : TService
  where TService : IMetaAspect
{
  var existingAspect = metaClass.Aspects.FirstOfTypeOrDefault();
  if (existingAspect != null)
  {
    copyValues(aspect, existingAspect);
  }

  setValues(aspect);

  return aspect;
}

That’s a bit more logical and intuitive, I think.

Redefine original method

We can now redefine our original method in terms of this one:

public static ClassCacheAspect SetAspectValues(
  this IMetaClass metaClass,
  Action setValues)
{
  return metaClass.UpdateAspect(
    new ClassCacheAspect(),
    (aspect, existingAspect) =>
    {
      result.OrderOfMagnitude = existingAspect.OrderOfMagnitude;
      result.Capacity = existingAspect.Capacity;
      // Set all other properties
    },
    setValues
  );
}

Generalize copying values

Can we somehow generalize the copying behavior? We could make a wrapper that expects an interface on the TService that would allow us to call CopyFrom(existingAspect).

public static TConcrete SetAspectValues(
  this IMetaClass metaClass,
  TConcrete aspect,
  Action setValues
)
  where TConcrete : TService, ICopyTarget
  where TService : IMetaAspect
{
  return metaClass.UpdateAspect(
    aspect,
    (aspect, existingAspect) => aspect.CopyFrom(existingAspect),
    setValues
  );
}

What does the ICopyTarget interface look like?

public interface ICopyTarget
{
  void CopyFrom(object other);
}

This is going to lead to type-casting code at the start of every implementation to make sure that the other object is the right type. We can avoid that by using a generic type parameter instead.

public interface ICopyTarget
{
  void CopyFrom(T other);
}

That’s better. How would we use it? Here’s the definition for ClassCacheAspect:

public class ClassCacheAspect : IClassCacheAspect, ICopyTarget
{
  public void CopyFrom(IClassCacheAspect otherAspect)
  {
    OrderOfMagnitude = otherAspect.OrderOfMagnitude;
    Capacity = otherAspect.Capacity;
    // Set all other properties
  }
}

Since the final version of ICopyTarget has a generic type parameter, we need to adjust the extension method. But that’s not a problem because we already have the required generic type parameter in the outer method.

public static TConcrete SetAspectValues(
  this IMetaClass metaClass,
  TConcrete aspect,
  Action setValues
)
  where TConcrete : TService, ICopyTarget
  where TService : IMetaAspect
{
  return metaClass.UpdateAspect(
    aspect,
    (aspect, existingAspect) => aspect.CopyFrom(existingAspect),
    setValues
  );
}

Final implementation

Assuming that the implementation of ClassCacheAspect implements ICopyTarget as shown above, then we can rewrite the cache-specific extension method to use the new extension method for ICopyTargets.

public static ClassCacheAspect SetCacheAspectValues(
  this IMetaClass metaClass,
  Action setValues)
{
  return metaClass.UpdateAspect(
    new ClassCacheAspect(),
    setValues
  );
}

This is an extension method, so any caller that wants to use its own IClassCacheAspect could just copy/paste this one line of code and use its own aspect.

Conclusion

This is actually pretty neat and clean:

We have a pattern where all properties on the interface are read-only
We have a pattern where an aspect can indicate how its values are to be copied from another instance. This is basically boilerplate, but must be written only once per aspect—and it can be located right in the implementation itself rather than in an extension method.
A caller building metadata passes in a single lambda to set values. Existing values are handled automatically.
Adding support for more aspects is straightforward and involves very little boilerplate.

[1] You would think that would be axiomatic. You’d be surprised.

ABD: Refactoring and refining an API

2016-05-21T10:58:43+02:00

Published by marco on 21. May 2016 10:58:43 (GMT-5)

Updated by marco on 21. May 2016 10:59:27 (GMT-5)

We’ve been doing more internal training at Encodo lately and one topic that we’ve started to tackle is design for architecture/APIs. Even if you’re not officially a software architect—designing and building entire systems from scratch—every developer designs code, on some level.

[A]lways
[B]e
[D]esigning

There are broad guidelines about how to format and style code, about how many lines to put in a method, about how many parameters to use, and so on. We strive for Clean Code™.

But the most important part of code is to think about how you’re writing it and what you’re building. You shouldn’t write a single line without thinking of the myriad ways in which it must fit into existing code and the established patterns and practices.

We’ve written about this before, in the two-part series called “Questions to consider when designing APIs” (Part I and Part II). Those two articles comprise a long list of aspects of a design to consider.

First make a good design, then compromise to fit project constraints.

Your project defines the constraints under which you can design. That is, we should still have our designer caps on, but the options available are much more strictly limited.

But, frustrating as that might be, it doesn’t mean you should stop thinking. A good designer figures out what would be optimal, then adjusts the solution to fit the constraints. Otherwise, you’ll forget what you were compromising from—and your design skills either erode or never get better.

We’ve been calling this concept ABD—Always Be Designing. [1] Let’s take a closer, concrete look, using a recent issue in the schema migration for Quino. Hopefully, this example illustrates how even the tiniest detail is important. [2]

A bug in the schema migrator

We detected the problem when the schema migration generated an invalid SQL statement.

ALTER TABLE "punchclock__timeentry" ALTER COLUMN "personid" SET DEFAULT ;

As you can see, the default value is missing. It seems that there are situations where the code that generates this SQL is unable to correctly determine that a default value could not be calculated.

The code that calculates the default value is below.

result = Builder.GetExpressionPayload(
  null,
  CommandFormatHints.DefaultValue,
  new ExpressionContext(prop),
  prop.DefaultValueGenerator
);

To translate, there is a Builder that produces a payload. We’re using that builder to get the payload (SQL, in this case) that corresponds to the DefaultValueGenerator expression for a given property, prop.

This method is an extension method of the IDataCommandBuilder, reproduced below in full, with additional line-breaks for formatting:

public static string GetExpressionPayload(
  this IDataCommandBuilder builder,
  [CanBeNull] TCommand command,
  CommandFormatHints hints, 
  IExpressionContext context,
  params IExpression[] expressions)
{
  if (builder == null) { throw new ArgumentNullException("builder"); }
  if (context == null) { throw new ArgumentNullException("context"); }
  if (expressions == null) { throw new ArgumentNullException("expressions"); }

  return builder.GetExpressionPayload(
    command,
    hints,
    context,
    expressions.Select(
      e => new ExecutableQueryItem(new ExecutableExpression(e))
    )
  );
}

This method does no more than to package each item in the expressions parameter in an ExecutableQueryItem and call the interface method.

The problem isn’t immediately obvious. It stems from the fact that each ExecutableQueryItem can be marked as Handled. The extension method ignores this feature, and always returns a result. The caller is unaware that the result may correspond to an only partially handled expression.

Is there a quick fix?

Our first instinct is, naturally, to try to figure out how we can fix the problem. [3] In the code above, we could keep a reference to the executable items and then check if any of them were unhandled, like so:

var executableItems = expressions.Select(
  e => new ExecutableQueryItem(new ExecutableExpression(e))
);
var result = builder.GetExpressionPayload(command, hints, context, executableItems);

if (executableItems.Unhandled().Any())
{
  // Now what?
}

return result;
}

We can detect if at least one of the input expressions could not be mapped to SQL. But we don’t know what to do with that information.

Do we throw an exception? No, we can’t just do that. None of the callers are expecting an exception, so that’s an API change. [4]
Do we return null? What can we return to indicate that the input expressions could not be mapped? Here we have the same problem as with throwing an exception: all callers assume that the result can be mapped.

So there’s no quick fix. We have to change an API. We have to design.

Part of the result is missing

As with most bugs, the challenge lies not in knowing how to fix the bug, but in how to fix the underlying design problem that led to the bug. The problem is actually not in the extension method, but in the method signature of the interface method.

Instead of a single result, there are actually two results for this method call:

Can the given expressions be mapped to a string (the target representation)?
If so, what is that text?

Instead of a Get method, this is a classic TryGet method.

How to Introduce the Change

If this code is already in production, then you have to figure out how to introduce the bug fix without breaking existing code. If you already have consumers of your API, you can’t just change the signature and cause a compile error when they upgrade. You have to decorate the existing method with [Obsolete] and make a new interface method.

So we don’t change the existing method and instead add the method TryGetExpressionPayload() to IDataCommandBuilder.

What are the parameters?

Now, let’s figure out what the parameters are going to be.

The method called by the extension method above has a slightly different signature. [5]

string GetExpressionPayload(
  [CanBeNull] TCommand command, 
  CommandFormatHints hints,
  [NotNull] IExpressionContext context,
  [NotNull] IEnumerable> expressions
);

That last parameter is a bit of a bear. What does it even mean? The signature of the extension method deals with simple IExpression objects—I know what those are. But what are ExecutableQueryItems and IExecutableExpressions?

As an author and maintainer of the data driver, I know that these objects are part of the internal representation of a query as it is processed. But as a caller of this method, I’m almost never going to have a list of these objects, am I?

Let’s find out.

Me: Hey, ReSharper, how many callers of that method are there in the entire Quino source?
ReSharper: Just one, Dave. [6]

So, we defined an API with a signature that’s so hairy no-one calls it except through an extension method that makes the signature more palatable. And it introduces a bug. Lovely.

We’ve now figured out that our new method should accept a sequence of IExpression objects instead of ExecutableQueryItem objects.

How’s the signature looking so far?

bool TryGetExpressionPayload(
  [CanBeNull] TCommand command, 
  CommandFormatHints hints,
  [NotNull] IExpressionContext context,
  [NotNull] IEnumerable expressions,
  out string payload
);

Are We Done?

Not quite. There are two things that are still wrong with this signature, both important.

Fix the Result Type

One problem is that the rest of the IDataCommandBuilder deals with a generic payload type and this method only works for builders where the target representation is a string. The Mongo driver, for example, uses MongoStorePayload and MongoRetrievePayload objects instead of strings and throws a NotSupportedException for this API.

That’s not very elegant, but the Mongo driver was forced into that corner by the signature. Can we do better? The API would currently require Mongo to always return false because our Mongo driver doesn’t know how to map anything to a string. But it could map to one of the aforementioned object representations.

If we change the out parameter type from a string to an object, then any driver, regardless of payload representation, has at least the possibility of implementing this API correctly.

Fix parameters

Another problem is that the order of parameters does not conform to the code style for Encodo.

We prefer to place all non-nullable parameters first. Otherwise, a call that passes null as the first parameter looks strange. The command can be null, so it should move after the two non-nullable parameters. If we move it all the way to the end, we can even make it optional.
Also, primitives should come after the references. (So hints should be third.)
Also, semantically, the call is getting the payload for the expressions not the context. The first parameter should be the target of the method; the rest of the parameters provide context for that input.
The original method accepted params IExpression[]. Using params allows a caller to provide zero or more expressions, but it’s only allowed on the terminal parameter. Instead, we’ll accept an IEnumerable, which is more standard for the Quino library anyway.

The final method signature is below.

bool TryGetExpressionPayload(
  [NotNull] IEnumerable expressions,
  [NotNull] IExpressionContext context,
  CommandFormatHints hints,
  out object payload,
  [CanBeNull] TCommand command = default(TCommand)
);

Our API in Action

The schema migration called the original API like this:

result = Builder.GetExpressionPayload(
  null,
  CommandFormatHints.DefaultValue,
  new ExpressionContext(prop),
  prop.DefaultValueGenerator
);

return true;

The call with the new API—and with the bug fixed—is shown below. The only non-functional addition is that we have to call ToSequence() on the first parameter (highlighted). Happily, though, we’ve fixed the bug and only include a default value in the field definition if one can actually be calculated.

object payload;
if (Builder.TryGetExpressionPayload(
  prop.DefaultValueGenerator.ToSequence(),
  new ExpressionContext(prop),
  CommandFormatHints.DefaultValue,
  out payload)
)
{
  result = payload as string ?? payload.ToString();

  return true;
}

One More Design Decision…

A good rule of thumb is that if you find yourself explaining something in detail, it might still be too complicated. In that light, the call to ToSequence() is a little distracting. [7] It would be nice to be able to map a single expression without having to pack it into a sequence.

So we have one more design decision to make: where do we add that method call? Directly to the interface, right? But the method for a single expression can easily be expressed in terms of the method we already have (as we saw above). It would be a shame if every implementor of the interface was forced to produce this boilerplate.

Since we’re using C#, we can instead extend the interface with a static method, as shown below (again, with more line breaks for this article):

public static bool TryGetExpressionPayload(
  [NotNull] this IDataCommandBuilder builder, // Extend the builder
  [NotNull] IExpression expression,
  [NotNull] IExpressionContext context,
  CommandFormatHints hints,
  out object payload,
  [CanBeNull] TCommand command = default(TCommand)
)
{
  return builder.TryGetExpressionPayload(
    expression.ToSequence(),
    context,
    hints,
    out payload,
    command
  );
}

We not only avoided cluttering the interface with another method, but now a caller with a single expression doesn’t have to create a sequence for it [8], as shown in the final version of the call below.

object payload;
if (Builder.TryGetExpressionPayload(
  prop.DefaultValueGenerator,
  new ExpressionContext(prop),
  CommandFormatHints.DefaultValue,
  out payload)
)
{
  result = payload as string ?? payload.ToString();

  return true;
}

Conclusion

We saw in this post how we always have our designer/architect cap on, even when only fixing bugs. We took a look at a quick-fix and then backed out and realized that we were designing a new solution. Then we covered, in nigh-excruciating detail, our thought process as we came up with a new solution.

Many thanks to Dani for the original design and Sebastian for the review!

[1] This is a bit of a riff on ABC—Always Be Closing—as popularized by Alec Baldwin in the movie Glengarry Glen Ross.

[2] Also, understand that it took much longer to write this blog post and itemize each individual step of how we thought about the issue. In reality, we took only a couple of minutes to work through this chain of reasoning and come up with the solution we wanted. It was only after we’d finished designing that I realized that this was a good example of ABD.

[3] Actually, our first instinct is to make sure that there is a failing test for this bug. But, this article deals with how to analyze problems and design fixes, not how to make sure that the code you write is tested. That’s super-important, too, though, just so you know. Essential, even.

[4] Even though C# doesn’t include the exceptions thrown in the signature of a method, as Java does. Where the Java version is fraught with issues, see the “Recoverable Errors: Type-Directed Exceptions” chapter of Midori: The Error Model by Joe Duffy for a really nice proposal/implementation of a language feature that includes expected exceptions in the signature of a method.

[5] Which is why we defined the extension method in the first place.

[6] I’m fully aware that my name isn’t Dave. It’s just what ReSharper calls me. Old-school reference.

[7] This was pointed out, by the way, by a reviewer of this blog post and escaped the notice of both designers and the code-reviewer. API design is neither easy nor is it done on the first try. It’s only finished after multiple developers have tried it out. Then, you’ll probably be able to live with it.

[8] Most developers would have used new [] { expression }, which I think is kind of ugly.

v2.2: Winform fixes and Query Improvements

2016-05-12T22:22:36+02:00

Published by marco on 12. May 2016 22:22:36 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

Lots of bug fixes and improvements for the Winform UI and German translations with the release of Punchclock on this version. (QNO-5162, QNO-5159, QNO-5158, QNO-5157, QNO-5156, QNO-5140, QNO-5155, QNO-5145, QNO-5111, QNO-5107, QNO-5106, QNO-5104, QNO-5015)
DateTimeExtensions.GetDayOfWeek() had a leap-day bug (QNO-5051)
Fixed how the hash code for GenericObjects is calculated, which fixes sorting issues in grids, specifically for non-persisted or transient objects (QNO-5137)
Improvements to the IAccessControl API for getting groups and users and testing membership (QNO-5133)
Add support for query aliases (e.g. for joining the same table multiple times) (QNO-531) This changes the API surface only minimally. Applications can pass an alias when calling the Join method, as shown below,
```
query.Join(Metadata.Project.Deputy, alias: "deputy")
```
You can find more examples of aliased queries in the TestAliasedQuery(), TestJoinAliasedTables(), TestJoinChildTwice() defined in the QueryTests testing fixture.
Add a standalone IQueryAnalyzer for optimizations and in-memory mini-drivers (QNO-4830)

Breaking changes

ISchemaManager has been removed. Instead, you should retrieve the interface you were looking for from the IOC. The possible interfaces you might need are IImportHandler, IMappingBuilder, IPlanBuilder or ISchemaCommandFactory.
ISchemaManagerSettings.GetAuthorized() has been moved to ISchemaManagerAuthorizer.
The hash-code fix for GenericObjects may have an effect on the way your application sorts objects.
The IParticipantManager (base interface of IAccessControl) no longer has a single method called GetGroups(IParticipant). This method was previously used to get the groups to which a user belongs and the child groups of a given group. This confusing double duty for the API led to an incorrect implementation for both usages. Instead, there are now two methods:
- IEnumerable GetGroups(IUser user): Gets the groups for the given user
- IEnumerable GetChildGroups(IGroup group): Gets the child groups for the given group
The old method has been removed from the interface because (A) it never worked correctly anyway and (B) it conflicts with the new API.

Quino Retrospective and Roadmap

2016-05-12T22:16:43+02:00

Published by marco on 12. May 2016 22:16:43 (GMT-5)

Updated by marco on 12. May 2016 22:30:34 (GMT-5)

History

Before taking a look at the roadmap, let’s quickly recap how far we’ve come. An overview of the release schedule shows a steady accretion of features over the years, as driven by customer or project needs.

Timeline

The list below includes more detail on the releases highlighted in the graphic. [1]

0.1: Proof of concept with metadata, PostgreSql (data and schema-migration) and Winforms UI
1.0: First customer product with PostgreSql, DevExpress Winforms UI and Reporting
1.0.5: MS-SQL driver (parity with PostgreSql driver)
1.5.0: Remoting data driver; require .NET 4.0
1.6.0: Mongo/NoSQL data driver
1.8.0: Rewrite data driver to use sessions
1.8.5: Support improved metadata-generation pattern
1.9.0: Add plugin/overlay support
1.10.0: Require .NET 4.5; add JSON-based remoting protocol; Windows-service support
1.13.0: Rewrite security API
v2.0-beta1: Rewrite configuration, logging and schema-migration APIs
v2.0-beta2: Add V2 generated-code format
2.0: Finish configuration/IOC rewrite; produce NuGet packages for delivery
2.2: Stabilize Winform; support aliased tables in queries
3.0: Rewrite MetaBuilder API; improve support for plugins

We took 1.5 years to get to v1. The initial major version was to signify the first time that Quino-based code went into external production. [2]

After that, it took 6.5 years to get to v2. Although we added several large products that use Quino, we were always able to extend rather than significantly change anything in the core. The second major version was to signify sweeping changes made to address technical debt, to modernize certain components and to prepare for changes coming to the .NET platform.

It took just 5 months to get to v3 for two reasons:

Although we were able to make a lot of planned changes in v2 [3], we had to leave some breaking changes for future versions. [4]
We now strictly adhere to the rule that a breaking change anywhere in the software’s API—and Quino’s API surface is large—leads automatically to a major-version change. [5]

Roadmap

So that’s where we’ve been. Where are we headed?

As you can see above, Quino is a very mature product that satisfies the needs of a wide array of software on all tiers. What more is there to add?

Quino’s design has always been driven by a combination of customer requirements and what we anticipated would be customer requirements.

We’re currently working on the following features.

Modeling improvements: This work builds on the API changes made to the MetaBuilder in v3. We’re creating a more fluent, modern and extensible API for building metadata. We hope to be able to add these changes incrementally without introducing any breaking changes. [6]
WPF / VSG: A natural use of the rich metadata in Quino is to generate user interfaces for business entities without have to hand-tool each form. From the POC onward, Quino has included support for generating UIs for .NET Winforms.

Winforms has been replaced on the Windows desktop with WPF and UWP. We’ve gotten quite far with being able to generate WPF applications from Quino metadata. The screenshots below come from a pre-alpha version of the Sandbox application included in the Quino solution.

Sandbox Date PickerSandbox Dialog ViewSandbox Edit ViewSandbox List View

You may have noticed the lovely style of the new UI. [7] We’re using a VSG designed for us by Ergosign, for whom we’ve done some implementation work in the past.
.NET Core: If you’ve been following Microsoft’s announcements, things are moving quickly in the .NET world. There are whole new platforms available, if you target your software to run on them. We’re investigating the next target platforms for Quino. Currently that means getting the core of Quino—Quino.Meta and its dependencies—to compile under .NET Core.

Compiling Encodo.Core on the .NET Core PlatformAs you can see in the screenshot, we’ve got one of the toughest assemblies to compile—Encodo.Core. After that, we’ll try for running some tests under Linux or OS X. The long-term goal is to be able to run Quino-based application and web servers on non-Windows—and, most importantly, non-IIS—platforms. [8]

These changes will almost certainly cause builds using previous versions to break. Look for any additional platform support in an upcoming major-version release.

[1] There were, of course, more minor and patch releases throughout, but those didn’t introduce any major new functionality.

[2] Punchclock, our time-entry and invoicing software—and Quino “dogfood” product—had been in use internally at Encodo earlier than that.

[3] E.g. splitting the monolithic Encodo and Quino assemblies into dozens of new, smaller and much more focused assemblies. Reorganizing configuration around the IOC and rewriting application startup for more than just desktop applications was another sweeping change.

[4] One of those breaking changes was to the MetaBuilder, which started off as a helper class for assembling application metadata, but became a monolithic and unavoidable dependency, even in v2. In v3, we made the breaking changes to remove this component from its central role and will continue to replace its functionality with components that more targeted, flexible and customizable.

[5] In the years between v1 and v2, we used the minor-version number to indicate when breaking changes could be made. We also didn’t try as hard to avoid breaking changes by gracefully deprecating code. The new approach tries very hard to avoid breaking changes but accepts the consequences when it’s deemed necessary by the team.

[6] That is, when users upgrade to a version with the newer APIs, they will get obsolete warnings but their existing code will continue to build and run, as before the upgrade. In this way, customers can smoothly upgrade without breaking any builds.

[7] You may also have noticed that the “Sandbox Dialog View” includes a little tag in it for the “XAML Spy”, a tool that we use for WPF development. Just so you know the screenshots aren’t faked… :-)

[8] As with the WPF interface, we’re likely to dogfood all of these technologies with Punchclock, our time-tracking and invoicing system written with Quino. The application server and web components that run on Windows could be migrated to run on one of our many Linux machines instead.

Quino v3.0: Metadata builders and code-generation improvements

2016-05-12T22:11:29+02:00

Published by marco on 12. May 2016 22:11:29 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

Metadata builders are more lightweight now and have fewer restrictions on their API. (QNO-4994)
Improve support for plugins (QNO-5189, QNO-5190, QNO-5196)
Improve and clean up expression library (QNO-5174, QNO-5176, QNO-5182, QNO-5183, QNO-5184, QNO-5188, QNO-5198, QNO-5205)
Improve schema-migration tools (QNO-5179, QNO-5206, QNO-5212, QNO-5213)

Breaking changes

IDataSession and IApplication now directly implement the IServiceRequestHandler and helper methods that used to extend IApplication now extend this interface instead, so calls like GetModel() can now be executed against an IApplication or an IDataSession. Many methods have been moved out of the IServiceRequestHandler interface to extension methods declared in the Encodo.IOC namespace. This move will require applications to update the usings. ReSharper will automatically find the correct namespace and apply it for you.
Similarly, the extension method ApplicationExtensions.GetInstance() has been replaced with a direct implementation of the IServiceRequestHandler by IApplication.
MetaBuilder.Include() has been replaced with Dependencies.Include()
When you call the new version of CreateModel(), you can no longer call CreateMainModule() because the main module is set up automatically. Although the call is marked as obsolete, it can only be combined with the older overload of the CreateModel(). Using it with the newer overload will cause a runtime error as the main module is added to the model twice.

The various methods to create paths with the MetaBuilder have been replaced by AddPath(). To rewrite a path, use the following style:

Builder.AddPath(
  Elements.Classes.A.FromOne("Id"), 
  Elements.Classes.B.ToMany("FileId"), 
  path => path.SetMetaId(new Guid("…")).SetDeleteRule(MetaPathRule.Cascade),
  idx => idx.SetMetaId(new Guid("…"))
);

C# Handbook Rewrite

2016-04-26T21:40:40+02:00

Published by marco on 26. Apr 2016 21:40:40 (GMT-5)

Updated by marco on 27. Apr 2016 07:13:40 (GMT-5)

Encodo published its first C# Handbook and published it to its web site in 2008. At the time, we also published to several other standard places and got some good, positive feedback. Over the next year, I made some more changes and published new versions. The latest version is 1.5.2 and is available from Encodo’s web site. Since then, though I’ve made a few extra notes and corrected a few errors, but never published an official version again.

This is not because Encodo hasn’t improved or modernized its coding guidelines, but because of several issues, listed below.

At 72 pages, it’s really quite long
A more compact, look-up reference would be nice
It contains a mix of C#-specific, Encodo-specific and library-specific advice
It’s maintained in Microsoft Word
Code samples are manually formatted
New versions are simply new copies in versioned folders (no source control)
Collaboration is nearly impossible
There is nothing about any .NET version newer than 3.5
There is no mention of any other programming language (e.g. TypeScript, JavaScript)
A lot of stuff is overly complicated (e.g. var advice) or just plain wrong (e.g. var advice)

To address these issues and to accommodate the new requirements, here’s what we’re going to do:

Convert the entire document from Word to Markdown and put it in a Git repository
- Collaboration? Pull requests. Branches.
- Versioning? Standard diffing of commits.
- Code samples? Automatic highlighting from GitLab (Encodo’s internal server) or GitHub (external repository).
Separate the chapters into individual files and keep them shorter and more focused on a single topic
Separate all of the advice and rules into the following piles:
- General programming advice and best practices
- C#-specific
- Encodo-specific
- Library-specific (e.g. Quino)

These are the requirements and goals for a new version of the C# handbook.

The immediate next steps are:

Convert current version from Microsoft Word to Markdown (~~done~~)
Add everything to a Git repository (~~done~~)
Overhaul the manual to remove incorrect and outdated material; address issues above (in progress)
Mirror externally (GitHub or GitLab or both)

I hope to have an initial, modern version ready within the next month or so.

API Design: The Road Not Taken

2016-04-07T22:27:10+02:00

Published by marco on 7. Apr 2016 22:27:10 (GMT-5)

“Unwritten code requires no maintenance and introduces no cognitive load.”

As I was working on another part of Quino the other day, I noticed that the oft-discussed registration and configuration methods [1] were a bit clunkier than I’d have liked. To whit, the methods that I tended to use together for configuration had different return types and didn’t allow me to freely mix calls fluently.

The difference between `Register` and `Use`

The return type for Register methods is IServiceRegistrationHandler and the return type for Use methods is IApplication (a descendant), The Register* methods come from the IOC interfaces, while the application builds on top of this infrastructure with higher-level Use* configuration methods.

This forces developers to write code in the following way to create and configure an application.

public IApplication CreateApplication()
{
  var result =
    new Application()
    .UseStandard()
    .UseOtherComponent();

  result.
    .RegisterSingle()
    .Register();

  return result;
}

That doesn’t look too bad, though, does it? It doesn’t seem like it would cramp anyone’s style too much, right? Aren’t we being a bit nitpicky here?

That’s exactly why Quino 2.0 was released with this API. However, here we are, months later, and I’ve written a lot more configuration code and it’s really starting to chafe that I have to declare a local variable and sort my method invocations.

So I think it’s worth addressing. Anything that disturbs me as the writer of the framework—that gets in my way or makes me write more code than I’d like—is going to disturb the users of the framework as well.

Whether they’re aware of it or not.

Developers are the Users of a Framework

In the best of worlds, users will complain about your crappy API and make you change it. In the world we’re in, though, they will cheerfully and unquestioningly copy/paste the hell out of whatever examples of usage they find and cement your crappy API into their products forever.

Do not underestimate how quickly calls to your inconvenient API will proliferate. In my experience, programmers really tend to just add a workaround for whatever annoys them instead of asking you to fix the problem at its root. This is a shame. I’d rather they just complained vociferously that the API is crap rather than using it and making me support it side-by-side with a better version for usually feels like an eternity.

Maybe it’s because I very often have control over framework code that I will just not deal with bad patterns or repetitive code. Also I’ve become very accustomed to having a wall of tests at my beck and call when I bound off on another initially risky but in-the-end rewarding refactoring.

If you’re not used to this level of control, then you just deal with awkward APIs or you build a workaround as a band-aid for the symptom rather than going after the root cause.

Better Sooner than Later

So while the code above doesn’t trigger warning bells for most, once I’d written it a dozen times, my fingers were already itching to add [Obsolete] on something.

I am well-aware that this is not a simple or cost-free endeavor. However, I happen to know that there aren’t that many users of this API yet, so the damage can be controlled.

If I wait, then replacing this API with something better later will take a bunch of versions, obsolete warnings, documentation and re-training until the old API is finally eradicated. It’s much better to use your own APIs—if you can—before releasing them into the wild.

Another more subtle reason why the API above poses a problem is that it’s more difficult to discover, to learn. The difference in return types will feel arbitrary to product developers. Code-completion is less helpful than it could be.

It would be much nicer if we could offer an API that helped users discover it at their own pace instead of making them step back and learn new concepts. Ideally, developers of Quino-based applications shouldn’t have to know the subtle difference between the IOC and the application.

A Better Way

Something like the example below would be nice.

return
  new Application()
  .UseStandard()
  .RegisterSingle()
  .UseOtherComponent()
  .Register();

Right? Not a gigantic change, but if you can imagine how a user would write that code, it’s probably a lot easier and more fluid than writing the first example. In the second example, they would just keep asking code-completion for the next configuration method and it would just be there.

Attempt #1: Use a Self-referencing Generic Parameter

In order to do this, I’d already created an issue in our tracker to parameterize the IServiceRegistrationHandler type in order to be able to pass back the proper return type from registration methods.

I’ll show below what I mean, but I took a crack at it recently because I’d just watched the very interesting video Fun with Generics by Benjamin Hodgson (Vimeo), which starts off with a technique identical to the one I’d planned to use—and that I’d already used successfully for the IQueryCondition interface. [2]

Let’s redefine the IServiceRegistrationHandler interface as shown below,

public interface IServiceRegistrationHandler
{
  TSelf Register()
      where TService : class
      where TImplementation : class, TService;

  // …
}

Can you see how we pass the type we’d like to return as a generic type parameter? Then the descendants would be defined as,

public interface IApplication : IServiceRegistrationHandler
{
}

In the video, Hodgson notes that the technique has a name in formal notation, “F-bounded quantification” but that a snappier name comes from the C++ world, “curiously recurring template pattern”. I’ve often called it a self-referencing generic parameter, which seems to be a popular search term as well.

This is only the first step, though. The remaining work is to update all usages of the formerly non-parameterized interface IServiceRegistrationHandler. This means that a lot of extension methods like the one below

public static IServiceRegistrationHandler RegisterCoreServices(
  [NotNull] this IServiceRegistrationHandler handler)
{
}

will now look like this:

public static TSelf RegisterCoreServices(
[NotNull] this IServiceRegistrationHandler handler)
  where TSelf : IServiceRegistrationHandler
{
}

This makes defining such methods more complex (again). [3] in my attempt at implementing this, Visual Studio indicated 170 errors remaining after I’d already updated a couple of extension methods.

Attempt #2: Simple Extension Methods

Instead of continuing down this path, we might just want to follow the pattern we established in a few other places, by defining both a Register method, which uses the IServiceRegistrationHandler, and a Use method, which uses the IApplication

Here’s an example of the corresponding “Use” method:

public static IApplication UseCoreServices(
  [NotNull] this IApplication application)
{
  if (application == null) { throw new ArgumentNullException("application"); }

  application
    .RegisterCoreServices()
    .RegisterSingle(application.GetServices())
    .RegisterSingle(application);

  return application;
}

Though the technique involves a bit more boilerplate, it’s easy to write and understand (and reason about) these methods. As mentioned in the initial sentence of this article, the cognitive load is lower than the technique with generic parameters.

The only place where it would be nice to have an IApplication return type is from the Register* methods defined on the IServiceRegistrationHandler itself.

We already decided that self-referential generic constraints would be too messy. Instead, we could define some extension methods that return the correct type. We can’t name the method the same as the one that already exists on the interface [4], though, so let’s prepend the word Use, as shown below:

IApplication UseRegister(
  [NotNull] this IApplication application)
      where TService : class
      where TImplementation : class, TService;
{
  if (application == null) { throw new ArgumentNullException("application"); }

  application.Register();

  return application;
}

That’s actually pretty consistent with the other configuration methods. Let’s take it for a spin and see how it feels. Now that we have an alternative way of registering types fluently without “downgrading” the result type from IApplication to IServiceRegistrationHandler, we can rewrite the example from above as:

return
  new Application()
  .UseStandard()
  .UseRegisterSingle()
  .UseOtherComponent()
  .UseRegister();

Instead of increasing cognitive load by trying to push the C# type system to places it’s not ready to go (yet), we use tiny methods to tweak the API and make it easier for users of our framework to write code correctly. [5]

[1] See Encodo’s configuration library for Quino Part 1, Part 2 and Part 3 as well as API Design: Running and Application Part 1 and Part 2 and, finally, Starting up an application, in detail.

[2] The video goes into quite a bit of depth on using generics to extend the type system in the direction of dependent types. Spoiler alert: he doesn’t make it because the C# type system can’t be abused in this way, but the journey is informative.

[3] As detailed in the links in the first footnote, I’d just gotten rid of this kind of generic constraint in the configuration calls because it was so ugly and offered little benefit.

[4]

If you define an extension method for a descendant type that has the same name as a method of an ancestor interface, the method-resolution algorithm for C# will never use it. Why? Because the directly defined method matches the name and all the types and is a “stronger” match than an extension method.

Perhaps an example is in order:

interface IA 
{
  IA RegisterSingle();
}

interface IB : IA { }

static class BExtensions
{
  static IB RegisterSingle(this IB b) { return b; }

  static IB UseStuff(this IB b) { return b; }
}

Let’s try to call the method from BExtensions:

public void Configure(IB b)
{
  b.RegisterSingle().UseStuff();
}

The call to UseStuff cannot be resolved because the return type of the matched RegisterSingle method is the IA of the interface method not the IB of the extension method. There is a solution, but you’re not going to like it (I know I don’t).

public void Configure(IB b)
{
  BExtensions.RegisterSingle(b).UseStuff();
}

You have to specify the extension-method class’s name explicitly, which engenders awkward fluent chaining—you’ll have to nest these calls if you have more than one—but the desired method-resolution was obtained.

But at what cost? The horror…the horror. (IMDb)

[5] The final example does not run against Quino 2.2, but will work in an upcoming version of Quino, probably 2.3 or 2.4.

v2.2: Winform fixes and Query Improvements

2016-03-25T13:41:54+01:00

Published by marco on 25. Mar 2016 13:41:54 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

Lots of bug fixes and improvements for the Winform UI and German translations with the release of Punchclock on this version. (QNO-5162, QNO-5159, QNO-5158, QNO-5157, QNO-5156, QNO-5140, QNO-5155, QNO-5145, QNO-5111, QNO-5107, QNO-5106, QNO-5104, QNO-5015)
DateTimeExtensions.GetDayOfWeek() had a leap-day bug (QNO-5051)
Fixed how the hash code for GenericObjects is calculated, which fixes sorting issues in grids, specifically for non-persisted or transient objects (QNO-5137)
Improvements to the IAccessControl API for getting groups and users and testing membership (QNO-5133)
Add support for query aliases (e.g. for joining the same table multiple times) (QNO-531) This changes the API surface only minimally. Applications can pass an alias when calling the Join method, as shown below,
```
query.Join(Metadata.Project.Deputy, alias: "deputy")
```
You can find more examples of aliased queries in the TestAliasedQuery(), TestJoinAliasedTables(), TestJoinChildTwice() defined in the QueryTests testing fixture.
Add a standalone IQueryAnalyzer for optimizations and in-memory mini-drivers (QNO-4830)

Breaking changes

ISchemaManager has been removed. Instead, you should retrieve the interface you were looking for from the IOC. The possible interfaces you might need are IImportHandler, IMappingBuilder, IPlanBuilder or ISchemaCommandFactory.
ISchemaManagerSettings.GetAuthorized() has been moved to ISchemaManagerAuthorizer.
The hash-code fix for GenericObjects may have an effect on the way your application sorts objects.
The IParticipantManager (base interface of IAccessControl) no longer has a single method called GetGroups(IParticipant). This method was previously used to get the groups to which a user belongs and the child groups of a given group. This confusing double duty for the API led to an incorrect implementation for both usages. Instead, there are now two methods:
- IEnumerable GetGroups(IUser user): Gets the groups for the given user
- IEnumerable GetChildGroups(IGroup group): Gets the child groups for the given group
The old method has been removed from the interface because (A) it never worked correctly anyway and (B) it conflicts with the new API.

Voxxed Zürich 2016: Notes

2016-03-25T13:41:30+01:00

Published by marco on 25. Mar 2016 13:41:30 (GMT-5)

This article was originally published on the Encodo Blogs.

This first-ever Voxxed Zürich was hosted at the cinema in the SihlCity shopping center in Zürich on March 3rd. All presentations were in English. The conference was relatively small—333 participants—and largely vendor-free. The overal technical level of the presentations and participants was quite high. I had a really nice time and enjoyed a lot of the presentations.

There was a nice common thread running through all of the presentations, starting with the Keynote. There’s a focus on performance and reliability through immutabiliy, sequences, events, actors, delayed execution (lambdas, which are relatively new to Java), instances in the cloud, etc. It sounds very BUZZWORDY, but instead it came as a very technically polished conference that reminded me of how many good developers there are trying to do the right thing. Looking forward to next year; hopefully Encodo can submit a presentation.

You can take a look at the VoxxedDays Zürich – Schedule. The talks that I visited are included below, with links to the presentation page, the video on YouTube and my notes and impressions. YMMV.

Keynote: Life beyond the Illusion of the Present

Life beyond the Illusion of the Present—Jonas Bonér

YouTube

Notes

He strongly recommended reading The Network is reliable by Peter Bailis.
This talk is about event-driven, CQRS programming.
Focus on immutable state, very much like JoeDuffy, etc. transactional accrual of facts.
Never delete data, annotate with more facts.
The reality at any point can be calculated for a point in time by aggregating facts up to that point. Like the talk I once wrote up some notes about (Runaway Complexity in Big Data, and a Plan to Stop It by Nathan Marz (InfoQ)).
Everything else is a performance optimization. Database views, tables are all caches on the transaction log. Stop throwing the log away, though.
Define smaller atomic units. Not a whole database. Smaller. Consistency boundary. Services?
Availability trumps consistency. Use causal consistency through mechanisms other than time stamps. Local partial better than global.
He talked about data-flow programming; fingers crossed that we get some language support in C# 7
Akka (Akka.NET) is the main product.

Kotlin − Ready for production

Kotlin − Ready for production—Hadi Hariri

YouTube

Used at JetBrains, open-source. 14k+ users. It’s not a ground-breaking language. They tried Scala and Scala was the first language they tried to use (Java already being off the table) but they didn’t like it, so they invented Kotlin.
Interoperable with Java (of course). Usable from all sorts of systems, but intelliJ Idea has first-class support.
Much less code, less maintenance. Encapsulates some concepts like “data classes” which do what they’re supposed for DTO definitions.
- Inferred type on declarations. No nulls. Null-safe by design. Opt-in for nulls.
- Implicit casts as well
- Interface delegation
- Lazy delegation
- Deconstruction
- Global infix operators; very expressive
- Also defaults to/focuses on immutability
- Algebraic data types/ data flow
- Anglo is statically typed XML views for android
JavaScript target exists and is the focus of work. Replacement for TypeScript?

Reactive Apps with Akka and AngularJS

Reactive Apps with Akka and AngularJS—Heiko Seeberger

YouTube

He strongly recommended reading the reactive manifesto
Responsive: timely response / non-functional / also under load / scale up/down/out
Resilient: fail early
Message-driven: async message-passing is a way of getting reactive/responsive. Automatic decoupling leads to better error-handling, no data loss
Akka provides support for:
- Actor-based model (actors are services); watch video from Channel Nine
- Akka HTTP Server is relatively new
- Akka is written in Scala
- There’s a Scala DSL for defining the controller (define routes)
- The Scala compiler is pure crap. Sooooo slooooowww (62 seconds for 12 files)

During his talk, he took us through the following stages of building a scalable, resilient actor-based application with Akka.

First he started with static HTML
Then he moved on to something connected to AKKA, but not refreshing
W3C Server-sent events is unidirectional channel from the server to the client. He next used this to have instant refresh on the client; not available on IE. Probably used by SignalR (or whatever replaced it)? Nothing is typed, though, just plain old JavaScript
Then he set up sharding
Then persistence (Cassandra, Kafka)

AKKA Distributed Data

Deals with keeping replicas consistent without central coordination
Conflict-free replicated data types
Fully distributed, has pub/sub semantics
Uses the Gossip protocol
Support various consistency strategies
Using AKKA gives you automated scaling support (unlike the SignalR demo Urs and I did over 2 years ago, but that was a chat app as well)

AKKA Cluster Sharding

Partitioning of actors/services across clusters
Supports various strategies
Default strategy is to distribute unbalanced actors to new shards
The ShardRegion is another actor that manages communication with sharded actors (entities). This introduces a new level of indirection, which must be honored in the code (?)

AKKA Persistence

Event-sourcing: validate commands, journal events, apply the event after persistence.
Application is applied to local state only after the journal/persistence has indicated that the command was journaled
On recovery, events are replayed
Supports snapshotting (caching points in time)
Requires a change to the actor/entity to use it. All written in Scala.

Akka looks pretty good. It guarantees the ordering because ACTORS. Any given actor only exists on any shard once. If a shard goes down, the actor is recreated on a different shard, and filled with information from the persistent store to “recreate” the state of that actor.

DDD (Domain-Driven Design) and the actor model. Watch Hewitt, Meijer and Szyperski: The Actor Model (everything you wanted to know, but were afraid to ask) (Channel9).

Code is on GitHub: seeberger/reactive_flows

Lambda core − hardcore

Lambda core − hardcore—Jarek Ratajski

YouTube

Focus on immutability and no side-effects. Enforced by the lambda calculus. Pretty low-level talk about lambda calculus. Interesting, but not applicable. He admitted as much at the top of the talk.

Links:

expect(“poo”).length.toBe(1)

expect(“poo”).length.toBe(1)—Philip Hofstetter [1]

YouTube

This was a talk about expectations of the length of a character. The presenter was very passionate about his talk and went into an incredible amount of detail.

What is a string? This is the kind of stuff every programmer needs to know. [2]
String is not a collection of bytes. It’s a sequence of graphemes. string <> char[]
UTF-16 is crap. What about the in-memory representation? Why in God’s name did Python 3 use UTF32? Unicode Transformation format.
What is the length of a string? ä is how many? Single character (diuresis included) or a with combining diuresis?
In-memory representation in Java and C# are UCS-2 (UNICODE 1); stuck in 1996, before Unicode 2.0 came out. This leaks into APIs because of how strings are returned … string APIs use UTF-16, encoding with surrogate pairs to get to characters outside of the BMP (understood by convention, but not by the APIs that expect UTF-16 … which has no idea what surrogate pairs are … and counting algorithms, find, etc. won’t work).
ECMAScript hasn’t really fixed this, either. substr() can break strings charAt() is still available and has no idea about code points. Does this apply to ES6? String-equality doesn’t work for the diuresis above.
So we’re stuck with server-side. Who does it right? Perl. Swift. Python. Ruby. Python went through hell with backwards compatibility but with 3.3 they’re doing OK again. Ruby strings are a tuple of encoding and data. All of the others have their string libraries dealing in graphemes. How did Perl always get it right? Perl has three methods for asking questions about length, in graphemes, code points or bytes
What about those of us using JavaScript? C#? Java? There are external libraries that we should be using. Not just for DateTime, but for string-handling as well. Even EcmaScript15 still uses code points rather than graphemes, so the count varies depending on how the grapheme is constructed.
Security concerns: certificate authorities have to be aware of homographs (e.g. a character that looks like another one, but has a different encoding/byte sequence).
He recommended the book Unicode explained (Amazon) by Jukka K. Korpela.

How usability fits in UX − it’s no PICNIC

How usability fits in UX − it’s no PICNIC—Myriam Jessier

YouTube

What should a UI be?

Functional
Reliable
Usable
Convenient
Pleasurable

Also nice to have:

Desirable
Delightful
memorable
Learnable
3 more

Book recommendation: Don’t make me think by Steve Krug

Avoid mindless and unambiguous clicks. Don’t count clicks, count useless shit you need to do.
Let the words go. People’s attention will wander.
UX is going to be somewhat subjective. Don’t try to please everyone.
OMG She uses hyphens correctly.
She discussed the difference between UX, CX, UI.
Personas are placeholders for your users. See Personapp to get started working with personas.

Guidelines:

Consistent and standardized UI
Guide the user (use visual cues, nudging)
Make the CallToAction (CTA) interactive objects obvious
Give feedback on progress, interaction
Never make a user repeat something they already told you. You’re software, you should have eidetic memory
Always have default values in forms (e.g. show the expected format)
Explain how the inputed information will be used (e.g. for marketing purposes)
No more “reset” button or mass-delete buttons. Don’t make it possible/easy to wipe out all someone’s data
Have clear and explanatory error or success messages (be encouraging)
Include a clear and visual hierarchy and navigation

Guidelines for mobile:

Make sure it works on all phones
Give incentives for sharing and purpose (engagement rates make marketing happy. CLICK THE BUTTON)
Keep usability and conversion in mind (not necessarily money, but you actually want people to be using your app correctly)
Usability (can you use your app on the lowest screen-brightness?)
…and more…
Make it pretty (some people don’t care, e.g. She very clearly said that she’s not aesthetically driven, it’s not her field; other people do care. A lot).
Give all the information a customer needs to purchase
Design for quick movement (no lag)
Do usability testing through video

Leverage expectations. Fit in to the environment. Search is on the left? Behind a button? Do that. Don’t make a new way of searching.
If you offer a choice, then make them as mutually exclusive as possible. When a company talks to itself (e.g. industry jargon), then users get confused
The registration process should be commensurate to the thing that you’re registering for
Small clickable ads on mobile. Make click targets appropriate.
Don’t blame negative feedback on “fear of change”. It’s probably you. If people don’t like it, then it might not be user-friendly. The example with Twitter’s star vs. heart. It’s interesting how we let the world frame our interactions. Why not both? Too complex? Would people really be confused by two buttons? One to “like” and one for “read later”?

Suggested usability testing tools:

Crazy Egg is $9 per month for heatmaps.
Qualaroo
Optimizely (A/B testing)
Usabilia
Userfeel
Trymyui

React − A trip to Russia isn’t all it seems

React − A trip to Russia isn’t all it seems—Josh Sephton [3]

YouTube

This talk was about Web UI frameworks and how his team settled on React.

Angular too “all or nothing”.
Backbone has no data-binding.
React looks good. Has its own routing for SPAs. Very component-heavy. Everything’s a component. Nothing new here so far.
They built their React to replace a Wordpress-based administration form
Stateful components are a bad idea
React components are like self-contained actors/services
They started with Flux, but ended up with Redux. We’re using Redux in our samples. I’m eyeballing how to integrate Akka.Net (although I’m not sure if that has anything to do with this.
ReactNative: write once, use on any device
Kind of superficial and kinda short but I knew all about this in React already

The reactor programming model for composable distributed computing

The reactor programming model for composable distributed computing—Aleksandar Prokopec [4]

YouTube

Reactive programming, with events as sequences of event objects
Events are equivalent to a list/sequence/streams (enumerable in C#)
This talk is also about managing concurrency
There must be a boundary between outer concurrent events vs. how your application works on them
That’s why most UI toolkits are single-threaded
Asynchronous is the antonym of concurrency (at least in the dictionary)
Filter the stream of events to compress them to frames, then render and log, so the events come in, are marshaled through the serializing bottleneck and are then dispatched asynchronously to different tasks
Reactor lets clients create their own channels (actors) from which they read events and which they register with a server so that it can publish
Akka supports setting up these things, Reactor is another implementation?
Dammit I want destructuring of function results (C# 7?)
It’s very easy to build client/server and broadcast and even ordered synchronization using UIDs (or that pattern mentioned by Jonas in the keynote) The UID needs to be location-specific, though. That’s not sufficient either, what you need is client-specific. For this, you need special data structures to store the data in a way that edits are automatically correctly ordered. Events sent for these changes make the events are ordered correctly
What is the CRDT? We just implemented an online collaborative editor: composes nicely and provides a very declarative, safe and scalable way of defining software. This is just a function (feeds back into the idea of lambdas here, actually, immutability, encapsulation)
Reactors

[1] I am aware of the irony that the emoji symbol for “poo” is not supported on this blogging software. That was basically the point of the presentation—that encoding support is difficult to get right. There’s an issue for it: Add support for UTF8 as the default encoding.

[2] In my near-constant striving to be the worst conversational partner ever, I once gave a similar encoding lesson to my wife on a two-hour walk around a lake when she dared ask why mails sometimes have those “stupid characters” in them.

Finovate 2016: Bank2Things

2016-03-25T13:39:04+01:00

Published by marco on 25. Mar 2016 13:39:04 (GMT-5)

At the beginning of the year, we worked on an interesting project that dipped into IOT (Internet of Things). The project was to create use cases for Crealogix’s banking APIs in the real world. Concretely, we wanted to show how a customer could use these APIs in their own workflows. The use cases were to provide proof of the promise of flexibility and integrability offered by well-designed APIs.

Watch 7–minute video of the presentation

The Use Cases

Football Club Treasurer

Activity Stream in the AppThe first use case is for the treasurer of a local football club. The treasurer wants to be notified whenever an annual club fee is transferred from a member. The club currently uses a Google Spreadsheet to track everything, but it’s updated manually. It would be really nice if the banking API could connected—via some scripting “glue”—to update the spreadsheet directly, without user intervention. The treasurer would just see the most current numbers whenever he opened the spreadsheet.

The spreadsheet is in addition to the up-to-date view of payments in the banking app. The information is also available there, but not necessarily in the form that he or she would like. Linking automatically to the spreadsheet is the added value.

Chore & Goal Tracker

Red Lamp!Green Lamp!Imagine a family with a young son who wants to buy a drone. He would have to earn it by doing chores. Instead of tracking this manually, the boy’s chores would be tabulated automatically, moving money from the parents’ account to his own as he did chores. Additionally, a lamp in the boy’s room would glow a color indicating how close he was to his goal. The parents wanted to track the boy’s progress in a spreadsheet, tracking the transfers as they would have had they not had any APIs.

The idea is to provided added value to the boy, who can record his chores by pressing a button and see his progress by looking at a lamp’s color. The parents get to stay in their comfort zone, working with a spreadsheet as usual, but having the data automatically entered in the spreadsheet.

The Plan

It’s a bit of a stretch, but it sufficed to ground the relatively abstract concept of banking APIs in an example that non-technical people could follow.

So we needed to pull quite a few things together to implement these scenarios.

A lamp that can be controlled via API
A button that can trigger an API
A spreadsheet accessibly via API
An API that can transfer money between accounts
“Glue” logic that binds these APIs together

The Lamp

Lifx BulbLifx Smartphone App
We looked at two lamps:

Either of these—just judging from their websites—would be sufficient to utterly and completely change our lives. The Hue looked like it was going to turn us into musicians, so we went with Lifx, which only threatened to give us horn-rimmed glasses and a beard (and probably skinny jeans and Chuck Taylor knockoffs).

Yeah, we think the marketing for what is, essentially, a light-bulb, is just a touch overblown. Still, you can change the color of the light bulb with a SmartPhone app, or control it via API (which is what we wanted to do).

The Button

The button sounds simple. You’d think that, in 2016, these things would be as ubiquitous as AOL CDs were in the 1990s. You’d be wrong.

Flic FunctionalityThere’s a KickStarter project called Flic that purports to have buttons that send signals over a wireless connection. They cost about CHF20. Though we ordered some, we never saw any because of manufacturing problems. If you thought the hype and marketing for a light bulb were overblown, then you’re sure to enjoy how Flic presents a button.

We quickly moved along a parallel track to get buttons that can be pressed in real life rather than just viewed from several different angles and in several different colors online.

Amazon Dash ButtonAmazon has what they have called “Dash” buttons that customers can press to add predefined orders to their one-click shopping lists. The buttons are bound to certain household products that you tend to purchase cyclically: toilet paper, baby wipes, etc.

They sell them dirt-cheap—$5—but only to Amazon Prime customers—and only to customers in the U.S. Luckily, we knew someone in the States willing to let us use his Amazon Prime account to deliver them, naturally only to a domestic address, from which they would have to be forwarded to us here in Switzerland.

That we couldn’t use them to order toilet paper in the States didn’t bother us—we were planning to hack them anyway.

These buttons showed up after a long journey and we started trapping them in our own mini-network so that we could capture the signal they send and interpret it as a trigger. This was not ground-breaking stuff, but we really wanted the demonstrator to be able to press a physical button on stage to trigger the API that would cascade other APIs and so on.

Of course we could have just hacked the whole thing so that someone presses a button on a screen somewhere—and we programmed this as a backup plan—but the physicality of pressing a button was the part of the demonstration that was intended to ground the whole idea for non-technical users. [1]

The Spreadsheet

Football Club Payment SpreadsheetDrone Payment SpreadsheetIf you’re going to use an API to modify a spreadsheet, then that spreadsheet has to be available online somewhere. The spreadsheet application in Google Docs is a good candidate.

The API allows you to add or modify existing data, but that’s pretty much it. When you make changes, they show up immediately, with no ceremony. That, unfortunately, doesn’t make for a very nice-looking demo.

Google Docs also offers a Javascript-like scripting language that let’s you do more. We wanted to not only insert rows, we wanted charts to automatically update and move down the page to accommodate the new row. All animated, thank you very much.

This took a couple pages of scripting and a good amount of time. It’s also no longer a solution that an everyday user is likely to make themselves. And, even though we pushed as hard as we could, we also didn’t get everything we wanted. The animation is very jerky (watch the video linked above) but gets the job done.

The Glue

Connection Selection in the AppSo we’ve got a bunch of pieces that are all capable of communicating in very similar ways. The final step is to glue everything together with a bit of script. There are several services available online, like IFTTT—If This Then That—that allow you to code simple logic to connect signals to actions.

In our system, we had the following signals:

Transfer was made to a bank account
Button was pressed

and the following actions:

Insert data into Google Spreadsheet
Set color of lamp

The Crealogix API and UI

Bank 2 ThingsSmall Transfer in the Activity StreamLogging in to the AppSo we’re going to betray a tiny secret here. Although the product demonstrated on-stage did actually do what it said, it didn’t do it using the Crealogix API to actually transfer money. That’s the part that we were actually selling and it’s the part we ended up faking/mocking out because the actual transfer is beside the point. Setting up bank accounts is not so easy, and the banks take umbrage at creating them for fake purposes.

Crealogix could have let us use fake testing accounts, but even that would have been more work than it was worth: if we’re already faking, why not just fake in the easiest way possible by skipping the API call to Crealogix and only updating the spreadsheet?

Likewise, the entire UI that we included in the product was mocked up to include only the functionality required by the demonstration. You can see an example here—of the login screen—but other screens are linked throughout this article. Likewise, the Bank2Things screen shown above and to the left is a mockup.

Wrapup

So what did Encodo actually contribute?

We used the Crealogix UX and VSG to mock up all of the app screens that you seen linked in this article. We did all of the animation and logic and styling.
We built two Google Spreadsheets and hooked them up to everything else
We hooked up the Lifx lamp API into our system
We hacked the Amazon Dash buttons to communicate in our own network instead of beaming home to the mothership
We built a web site to handle any mocking/faking that needed to be done for the demo and through which the devices communicated
We provided a VM (Virtual Machine) on which everything ran (other than the Google Spreadsheets)

As last year—when we helped Crealogix create the prototype for their BankClip for Finovate 2015—we had a lot of fun investigating all of these cutting-edge technologies and putting together a custom solution in time for Finovate 2016.

[1] As it turns out, if you watch the 7–minute video of the presentation, nowhere do you actually see a button. Maybe they could see them from the audience.

Mini-applications and utilities with Quino

2016-02-27T12:36:39+01:00

Published by marco on 27. Feb 2016 12:36:39 (GMT-5)

Updated by marco on 27. Feb 2016 12:52:18 (GMT-5)

In several articles last year [1], I went into a lot of detail about the configuration and startup for Quino applications. Those posts discuss a lot about what led to the architecture Quino has for loading up an application.

Some of you might be wondering: what if I want to start up and run an application that doesn’t use Quino? Can I build applications that don’t use any fancy metadata because they’re super-simple and don’t even need to store any data? Those are the kind of utility applications I make all the time; do you have anything for me, you continue to wonder?

As you probably suspected from the leading question: You’re in luck. Any functionality that doesn’t need metadata is available to you without using any of Quino. We call this the “Encodo” libraries, which are the base on which Quino is built. Thanks to the fundamental changes made in Quino 2, you have a wealth of functionality available in just the granularity you’re looking for.

Why use a Common Library?

Instead of writing such small applications from scratch—and we know we could write them—why would we want to leverage existing code? What are the advantages of doing this?

Writing code that is out of scope takes time away from writing code that is in scope.
Code you never write has no bugs.
It also doesn’t require maintenance or documentation.
While library code is not guaranteed to be bug-free, it’s probably much better off than the code you just wrote seconds ago.
Using a library increases the likelihood of having robust, reliable and extendible code for out-of-scope application components.
One-off applications tend to be maintainable only by the originator. Applications using a common library can be maintained by anyone familiar with that library.
Without a library, common mistakes must be fixed in all copies, once for each one-off application.
The application can benefit from bug fixes and improvements made to the library.
Good practices and patterns are encouraged/enforced by the library.

What are potential disadvantages?

The library might compel a level of complexity that makes it take longer to create the application than writing it from scratch
The library might force you to use components that you don’t want.
The library might hamstring you, preventing innovation.

A developer unfamiliar with a library—or one who is too impatient to read up on it—will feel these disadvantages more acutely and earlier.

Two Sample Applications

Let’s take a look at some examples below to see how the Encodo/Quino libraries stack up. Are we able to profit from the advantages without suffering from the disadvantages?

We’re going to take a look at two simple applications:

An application that loads settings for Windows service-registration. We built this for a customer product.
The Quino Code Generator that we use to generate metadata and ORM classes from the model

Windows Service Installer

The actual service-registration part is boilerplate generated by Microsoft Visual Studio [2], but we’d like to replace the hard-coded strings with customized data obtained from a configuration file. So how do we get that data?

The main requirement is that the user should be able to indicate which settings to use when registering the Windows service.
The utility could read them in from the command line, but it would be nicer to read them from a configuration file.

That doesn’t sound that hard, right? I’m sure you could just whip something together with an XMLDocument and some hard-coded paths and filenames that would do the trick. [3] It might even work on the first try, too. But do you really want to bother with all of that? Wouldn’t you rather just get the scaffolding for free and focus on the part where you load your settings?

Getting the Settings

The following listing shows the main application method, using the Encodo/Quino framework libraries to do the heavy lifting.

[NotNull]
public static ServiceSettings LoadServiceSettings()
{
  ServiceSettings result = null;
  var transcript = new ApplicationManager().Run(
    CreateServiceConfigurationApplication,
    app => result = app.GetInstance()
  );

  if (transcript.ExitCode != ExitCodes.Ok)
  {
    throw new InvalidOperationException(
      "Could not read the service settings from the configuration file." + 
      new SimpleMessageFormatter().GetErrorDetails(transcript.Messages)
    );
  }

  return result;
}

If you’ve been following along in the other articles (see first footnote below), then this structure should be very familiar. We use an ApplicationManager() to execute the application logic, creating the application with CreateServiceConfigurationApplication and returning the settings configured by the application in the second parameter (the “run” action). If anything went wrong, we get the details and throw an exception.

You can’t see it, but the library provides debug/file logging (if you enable it), debug/release mode support (exception-handling, etc.) and everything is customizable/replaceable by registering with an IOC.

Configuring the Settings Loader

Soooo…I can see where we’re returning the ServiceSettings, but where are they configured? Let’s take a look at the second method, the one that creates the application.

private static IApplication CreateServiceConfigurationApplication()
{
  var application = new Application();
  application
    .UseSimpleInjector()
    .UseStandard()
    .UseConfigurationFile("service-settings.xml")
    .Configure(
      "service", 
      (settings, node) =>
      {
        settings.ServiceName = node.GetValue("name", settings.ServiceName);
        settings.DisplayName = node.GetValue("displayName", settings.DisplayName);
        settings.Description = node.GetValue("description", settings.Description);
        settings.Types = node.GetValue("types", settings.Types);
      }
    ).RegisterSingle();

  return application;
}

First, we create a standard Application, defined in the Encodo.Application assembly. What does this class do? It does very little other than manage the main IOC (see articles linked in the first footnote for details).
The next step is to choose an IOC, which we do by calling UseSimpleInjector(). Quino includes support for the SimpleInjector IOC out of the box. As you can see, you must include this support explicitly, so you’re also free to assign your own IOC (e.g. one using Microsoft’s Unity). SimpleInjector is very lightweight and super-fast, so there’s no downside to using it.
Now we have an application with an IOC that doesn’t have any registrations on it. How do we get more functionality? By calling methods like UseStandard(), defined in the Encodo.Application.Standard assembly. Since I know that UseStandard() pulls in what I’m likely to need, I’ll just use that. [4]
The next line tells the application the name of the configuration file to use. [5]
The very next line is already application-specific code, where we configure the ServiceSettings object that we want to return. For that, there’s a Configure method that returns an object from the IOC along with a specific node from the configuration data. This method is called only if everything started up OK.
The final call to RegisterSingle makes sure that the ServiceSettings object created by the IOC is a singleton (it would be silly to configure one instance and return another, unconfigured one).

Basically, because this application is so simple, it has already accomplished its goal by the time the standard startup completes. At the point that we would “run” this application, the ServiceSettings object is already configured and ready for use. That’s why, in LoadServiceSettings(), we can just get the settings from the application with GetInstance() and exit immediately.

Code Generator

The code generator has a bit more code, but follows the same pattern as the simple application above. In this case, we use the command line rather than the configuration file to get user input.

Execution

The main method defers all functionality to the ApplicationManager, passing along two methods, one to create the application, the other to run it.

internal static void Main()
{
  new ApplicationManager().Run(CreateApplication, GenerateCode);
}

Configuration

As before, we first create an Application, then choose the SimpleInjector and some standard configuration and registrations with UseStandard(), UseMetaStandardServices() and UseMetaTools(). [6]

We set the application title to “Quino Code Generator” and then include objects with UseSingle() that will be configured from the command line and used later in the application. [7] And, finally, we add our own ICommandSet to the command-line processor that will configure the input and output settings. We’ll take a look at that part next.

private static IApplication CreateApplication(
  IApplicationCreationSettings applicationCreationSettings)
{
  var application = new Application();

  return
    application
    .UseSimpleInjector()
    .UseStandard()
    .UseMetaStandardServices()
    .UseMetaTools()
    .UseTitle("Quino Code Generator")
    .UseSingle(new CodeGeneratorInputSettings())
    .UseSingle(new CodeGeneratorOutputSettings())
    .UseUnattendedCommand()
    .UseCommandSet(CreateGenerateCodeCommandSet(application))
    .UseConsole();
}

Command-line Processing

The final bit of the application configuration is to see how to add items to the command-line processor.

Basically, each command set consists of required values, optional values and zero or more switches that are considered part of a set.

The one for i simply sets the value of inputSettings.AssemblyFilename to whatever was passed on the command line after that parameter. Note that it pulls the inputSettings from the application to make sure that it sets the values on the same singleton reference as will be used in the rest of the application.

The code below shows only one of the code-generator–specific command-line options. [8]

private static ICommandSet CreateGenerateCodeCommandSet(
  IApplication application)
{
  var inputSettings = application.GetSingle();
  var outputSettings = application.GetSingle();

  return new CommandSet("Generate Code")
  {
    Required =
    {
      new OptionCommandDefinition
      {
        ShortName = "i",
        LongName = "in",
        Description = Resources.Program_ParseCommandLineArgumentIn,
        Action = value => inputSettings.AssemblyFilename = value
      },
      // And others…
    },
  };
}

Code-generation

Finally, let’s take a look at the main program execution for the code generator. It shouldn’t surprise you too much to see that the logic consists mostly of getting objects from the IOC and telling them to do stuff with each other. [9]

I’ve highlighted the code-generator–specific objects in the code below. All other objects are standard library tools and interfaces.

private static void GenerateCode(IApplication application)
{
  var logger = application.GetLogger();
  var inputSettings = application.GetInstance();

  if (!inputSettings.TypeNames.Any())
  {
    logger.Log(Levels.Warning, "No types to generate.");
  }
  else
  {
    var modelLoader = application.GetInstance();
    var metaCodeGenerator = application.GetInstance();
    var outputSettings = application.GetInstance();
    var modelAssembly = AssemblyTools.LoadAssembly(
      inputSettings.AssemblyFilename, logger
    );

    outputSettings.AssemblyDetails = modelAssembly.GetDetails();

    foreach (var typeName in inputSettings.TypeNames)
    {
      metaCodeGenerator.GenerateCode(
        modelLoader.LoadModel(modelAssembly, typeName), 
        outputSettings,
        logger
      );
    }
  }
}

So that’s basically it: no matter how simple or complex your application, you configure it by indicating what stuff you want to use, then use all of that stuff once the application has successfully started. The Encodo/Quino framework provides a large amount of standard functionality. It’s yours to use as you like and you don’t have to worry about building it yourself. Even your tiniest application can benefit from sophisticated error-handling, command-line support, configuration and logging without lifting a finger.

[1] See Encodo’s configuration library for Quino Part 1, Part 2 and Part 3 as well as API Design: Running and Application Part 1 and Part 2 and, finally, Starting up an application, in detail.

[2]

That boilerplate looks like this:

var fileService = new ServiceInstaller();
fileService.StartType = ServiceStartMode.Automatic;
fileService.DisplayName = "Quino Sandbox";
fileService.Description = "Demonstrates a Quino-based service.";
fileService.ServiceName = "Sandbox.Services";

See the ServiceInstaller.cs file in the Sandbox.Server project in Quino 2.1.2 and higher for the full listing.

[3]

The standard implementation of Quino’s ITextKeyValueNodeReader supports XML, but it would be trivial to create and register a version that supports JSON (QNO-4993) or YAML. The configuration file for the utility looks like this:



  
    Quino.Services
    Quino Utility
    The application to run all Quino backend services.
    All

[4]

If you look at the implementation of the UseStandard method [10], it pulls in a lot of stuff, like support for BCrypt, enhanced CSV and enum-value parsing and standard configuration for various components (e.g. the file log and command line). It’s called “Standard” because it’s the stuff we tend to use in a lot of applications.

But that method is just a composition of over a dozen other methods. If, for whatever reason (perhaps dependencies), you don’t want all of that functionality, you can just call the subset of methods that you do want. For example, you could call UseApplication() from the Encodo.Application assembly instead. That method includes only the support for:

Processing the command line (ICommandSetManager)
Locating external files (ILocationManager)
Loading configuration data from file (IConfigurationDataLoader)
Debug- and file-based logging (IExternalLoggerFactory)
and interacting with the IApplicationManager.

If you want to go even lower than that, you can try UseCore(), defined in the Encodo.Core assembly and then pick and choose the individual components yourself. Methods like UseApplication() and UseStandard() are tried and tested defaults, but you’re free to configure your application however you want, pulling from the rich trove of features that Quino offers.

[5]

By default, the application will look for this file next to the executable. You can configure this as well, by getting the location manager with GetLocationManager() and setting values on it.

You’ll notice that I didn’t use Configure() for this particular usage. That’s ordinarily the way to go if you want to make changes to a singleton before it is used. However, if you want to change where the application looks for configuration files, then you have to change the location manager before it’s used any other configuration takes place. It’s a special object that is available before the IOC has been fully configured. To reiterate from other articles (because it’s important), the order of operations we’re interested in here are:

Create application (this is where you call Use*() to build the application)
Get the location manager to figure out the path for LocationNames.Configuration
Load the configuration file
Execute all remaining actions, including those scheduled with calls to Configure()

If you want to change the configuration-file location, then you have to get in there before the startup starts running—and that’s basically during application construction. Alternatively, you could also call UseConfigurationDataLoader() to register your own object to actually load configuration data and do whatever the heck you like in there, including returning constant data. :-)

[6] The metadata-specific analog to UseStandard() is UseMetaStandard(), but we don’t call that. Instead, we call UseMetaStandardServices(). Why? The answer is that we want the code generator to be able to use some objects defined in Quino, but the code generator itself isn’t a metadata-based application. We want to include the IOC registrations required by metadata-based applications without adding any of the startup or shutdown actions. Many of the standard Use*() methods included in the base libraries have analogs like this. The Use*Services() analogs are also very useful in automated tests, where you want to be able to create objects but don’t want to add anything to the startup.

[7] Wait, why didn’t we call RegisterSingle()? For almost any object, we could totally do that. But objects used during the first stage of application startup—before the IOC is available—must go in the other IOC, accessed with SetSingle() and GetSingle().

[8] The full listing is in Program.cs in the Quino.CodeGenerator project in any 2.x version of Quino.

[9] Note that, once the application is started, you can use GetInstance() instead of GetSingle() because the IOC is now available and all singletons are mirrored from the startup IOC to the main IOC. In fact, once the application is started, it’s recommended to use GetInstance() everywhere, for consistency and to prevent the subtle distinction between IOCs—present only in the first stage of startup—from bleeding into your main program logic.

[10] If you have the Quino source code handy, you can look it up there, but if you have ReSharper installed, you can just F12 on UseStandard() to decompile the method. In the latest DotPeek, the extension methods are even displayed much more nicely in decompiled form.

Verity Stob Teaches Functional Programming

2016-01-17T22:27:10+01:00

Published by marco on 17. Jan 2016 22:27:10 (GMT-5)

Updated by marco on 18. Jan 2016 07:12:55 (GMT-5)

The article Learn you Func Prog on five minute quick! by Verity Stob (The Register) provides a typically twisted and unhelpful overview of the state of functional programming in this 21st-century renaissance—heralded decades ago by Lisp programmers. It includes an honest overview of the major players, including Scala, for which the “pro” and “con” are the same (a “[c]lose relationship with Java […]”) and ending with JavaScript, for which the “pro” is “It’s what you’ll end up using.”

The discussion continues with rules: variable immutability, function purity, curryability and monadicity, which is where things really go off the rails. Property 7 dribbles to a shuddering halt with,

“All monads define a unit() function called of(), a bind() function called map() and a type constructor function called…

“Wait a minute. Wait a minute. Perhaps bind() is a functor not a function. I’m pretty sure about that. Hold on to the horses a moment there while I look it up.

“…And I should perhaps clarify that this bind() and map() is nothing to do with any other bind() or map() methods or functions that you might be familiar with, although their actions are in some sense quite similar.

“Summary: It has been an honour and a pleasure to clear all that up for you.

“Final Reader’s comment: My gratitude is inexpressible. [1]”

[1] I’ve been reading Verity Stob since the days when she was published in Dr. Dobbs. Nice to see her still going with the same sense of humour (sic) and insight into the at-times puffed-up and self-important programming world. [2]

[2]

Which is not to say that I don’t enjoy immensely the functional aspects of C#. I do. I also have read a lot about monads and am completely familiar with the tragically bad and unenlightening explanations. Stob captures this elegantly with the following corollary to Rule 4:

“If you should by some accident come to understand what a Monad is, you will simultaneously lose the ability to explain it to anybody else.”

Profiling: that critical 3% (Part II)

2016-01-16T12:53:04+01:00

Published by marco on 16. Jan 2016 12:53:04 (GMT-5)

In part I of this series, we discussed some core concepts of profiling. In that article, we not only discussed the problem at hand, but also how to think about not only fixing performance problems, but reducing the likelihood that they get out of hand in the first place.

In this second part, we’ll go into detail and try to fix the problem.

Reëvaluating the Requirements

Since we have new requirements for an existing component, it’s time to reconsider the requirements for all stakeholders. In terms of requirements, the IScope can be described as follows:

Hold a list of objects in LIFO order
Hold a list of key/value pairs with a unique name as the key
Return the value/reference for a key
Return the most appropriate reference for a given requested type. The most appropriate object is the one that was added with exactly the requested type. If no such object was added, then the first object that conforms to the requested type is returned
These two piles of objects are entirely separate: if an object is added by name, we do not expect it to be returned when a request for an object of a certain type is made

There is more detail, but that should give you enough information to understand the code examples that follow.

Usage Patterns

There are many ways of implementing the functional requirements listed above. While you can implement the feature with only requirements, it’s very helpful to know usage patterns when trying to optimize code.

Therefore, we’d like to know exactly what kind of contract our code has to implement—and to not implement any more than was promised.

Sometimes a hopeless optimization task gets a lot easier when you realize that you only have to optimize for a very specific situation. In that case, you can leave the majority of the code alone and optimize a single path through the code to speed up 95% of the calls. All other calls, while perhaps a bit slow, will at least still be yield the correct results.

And “optimized” doesn’t necessarily mean that you have to throw all of your language’s higher-level constructs out the window. Once your profiling tool tells you that a particular bit of code has introduced a bottleneck, it often suffices to just examine that particular bit of code more closely. Just picking the low-hanging fruit will usually be more than enough to fix the bottleneck. [1]

Create scopes faster [2]

I saw in the profiler that creating the ExpressionContext had gotten considerably slower. Here’s the code in the constructor.

foreach (var value in values.Where(v => v != null))
{
  Add(value);
}

I saw a few potential problems immediately.

The call to Add() had gotten more expensive in order to return the most appropriate object from the GetInstances() method
The Linq replaced a call to AddRange()

The faster version is below:

var scope = CurrentScope;
for (var i = 0; i < values.Length; i++)
{
  var value = values[i];
  if (value != null)
  {
    scope.AddUnnamed(value);
  }
}

Why is this version faster? The code now uses the fact that we know we’re dealing with an indexable list to avoid allocating an enumerator and to use non-allocating means of checking null. While the Linq code is highly optimized, a for loop is always going to be faster because it’s guaranteed not to allocate anything. Furthermore, we now call AddUnnamed() to use the faster registration method because the more involved method is never needed for these objects.

The optimized version is less elegant and harder to read, but it’s not terrible. Still, you should use these techniques only if you can prove that they’re worth it.

Optimizing `CurrentScope`

Another minor improvement is that the call to retrieve the scope is made only once regardless of how many objects are added. On the one hand, we might expect only a minor improvement since we noted above that most use cases only ever add one object anyway. On the other, however, we know that we call the constructor 20 million times in at least one test, so it’s worth examining.

The call to CurrentScope gets the last element of the list of scopes. Even something as innocuous as calling the Linq extension method Last() can get more costly than it needs to be when your application calls it millions of times. Of course, Microsoft has decorated its Linq calls with all sorts of compiler hints for inlining and, of course, if you decompile, you can see that the method itself is implemented to check whether the target of the call is a list and use indexing, but it’s still slower. There is still an extra stack frame (unless inlined) and there is still a type-check with as.

Replacing a call to Last() with getting the item at the index of the last position in the list is not recommended in the general case. However, making that change in a provably performance-critical area shaved a percent or two off a test run that takes about 45 minutes. That’s not nothing.

protected IScope CurrentScope
{
  get { return _scopes.Last(); }
}

protected IScope CurrentScope
{
  get { return _scopes[_scopes.Count − 1]; }
}

That takes care of the creation & registration side, where I noticed a slowdown when creating the millions of ExpressionContext objects needed by the data driver in our product’s test suite.

Get objects faster

Let’s now look at the evaluation side, where objects are requested from the context.

The offending, slow code is below:

public IEnumerable GetInstances()
{
  var serviceType = typeof(TService);
  var rawNameMatch = this[serviceType.FullName];

  var memberMatches = All.OfType();
  var namedMemberMatches = NamedMembers.Select(
    item => item.Value
  ).OfType();

  if (rawNameMatch != null)
  {
    var nameMatch = (TService)rawNameMatch;

    return
      nameMatch
      .ToSequence()
      .Union(namedMemberMatches)
      .Union(memberMatches)
      .Distinct(ReferenceEqualityComparer.Default);
  }

  return namedMemberMatches.Union(memberMatches);
}

As you can readily see, this code isn’t particularly concerned about performance. It is, however, relatively easy to read and to figure out the logic behind returning objects, though. As long as no-one really needs this code to be fast—if it’s not used that often and not used in tight loops—it doesn’t matter. What matters more is legibility and maintainability.

But we now know that we need to make it faster, so let’s focus on the most-likely use cases. I know the following things:

Almost all Scope instances are created with a single object in them and no other objects are ever added.
Almost all object-retrievals are made on such single-object scopes
Though the scope should be able to return all matching instances, sorted by the rules laid out in the requirements, all existing calls get the FirstOrDefault() object.

These extra bits of information will allow me to optimize the already-correct implementation to be much, much faster for the calls that we’re likely to make.

The optimized version is below:

public IEnumerable GetInstances()
{
  var members = _members;

  if (members == null)
  {
    yield break;
  }

  if (members.Count == 1)
  {
    if (members[0] is TService)
    {
      yield return (TService)members[0];
    }

    yield break;
  }

  object exactTypeMatch;
  if (TypedMembers.TryGetValue(typeof(TService), out exactTypeMatch))
  {
    yield return (TService)exactTypeMatch;
  }

  foreach (var member in members.OfType())
  {
    if (!ReferenceEquals(member, exactTypeMatch))
    {
      yield return member;
    }
  }
}

Given the requirements, the handful of use cases and decent naming, you should be able to follow what’s going on above. The code contains many more escape clauses for common and easily handled conditions, handling them in an allocation-free manner wherever possible.

Handle empty case
Handle single-element case
Return exact match
Return all other matches [3]

You’ll notice that returning a value added by-name is not a requirement and has been dropped. Improving performance by removing code for unneeded requirements is a perfectly legitimate solution.

Test Results

And, finally, how did we do? I created tests for the following use cases:

Create scope with multiple objects
Get all matching objects in an empty scope
Get first object in an empty scope
Get all matching objects in a scope with a single object
Get first object in a scope with a single object
Get all matching objects in a scope with multiple objects
Get first object in a scope with multiple objects

Here are the numbers from the automated tests.

Before optimization

After optimization

Create scope with multiple objects — 12x faster
Get all matching objects in an empty scope — almost 2.5x faster
Get first object in an empty scope — almost 3.5x faster
Get all matching objects in a scope with a single object — over 3x faster
Get first object in a scope with a single object — over 3.25x faster
Get all matching objects in a scope with multiple objects — almost 3x faster
Get first object in a scope with multiple objects — almost 2.25x faster

This looks amazing but remember: while the optimized solution may be faster than the original, all we really know is that we’ve just managed to claw our way back from the atrocious performance characteristics introduced by a recent change. We expect to see vast improvements versus a really slow version.

Since I know that these calls showed up as hotspots and were made millions of times in the test, the performance improvement shown by these tests is enough for me to deploy a pre-release of Quino via TeamCity, upgrade my product to that version and run the tests again. Wish me luck! [4]

[1] Or, most likely, push it to some other piece of code.

[2] The best approach at this point is to create issues for the other performance investigations you could make. For example, I opened an issue called Optimize allocations in the data handlers (start with IExpressionContexts), documented everything I had analyzed and quickly got back to the issue on which I’d started.

[3] For those with access to the Quino Git repository, the diffs shown below come from commit a825d5030ce6f65a452e1db85a308e1351288b96.

[4] If you’re following along very, very carefully, you’ll recall at this point that the requirement stated above is that objects are returned in LIFO order. The faster version of the code returns objects in FIFO order. You can’t tell that the original, slow version did guarantee LIFO ordering, but only because the call to get All members contained a hidden call to the Linq call Reverse(), which slowed things down even more! I removed the call to reverse all elements because (A) I don’t actually have any tests for the LIFO requirement nor (B) do I have any other code that expects it to happen. I wasn’t about to make the code even more complicated and possibly slower just to satisfy a purely theoretical requirement. That’s the kind of behavior that got me into this predicament in the first place.

[5] Spoiler alert: it worked. ;-) The fixes cut the testing time from about 01:30 to about 01:10 for all tests on the build server, so we won back the lost 25%.

Profiling: that critical 3% (Part I)

2016-01-13T07:05:23+01:00

Published by marco on 13. Jan 2016 07:05:23 (GMT-5)

An oft-quoted bit of software-development sagacity is

“Premature optimization is the root of all evil.”

—Donald Knuth

As is so often the case with quotes—especially those on the Internet [1]—this one has a slightly different meaning in context. The snippet above invites developers to overlook the word “premature” and interpret the received wisdom as “you don’t ever need to optimize.”

Instead, Knuth’s full quote actually tells you how much of your code is likely to be affected by performance issues that matter (highlighted below).

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”

Computer Programming as an Art (1974) by Donald Knuth (WikiQuote)

An Optimization Opportunity in Quino [2]

In other articles, I’d mentioned that we’d upgraded several solutions to Quino 2 in order to test that the API was solid enough for a more general release. One of these products is both quite large and has a test suite of almost 1500 tests. The product involves a lot of data-import and manipulation and the tests include several scenarios where Quino is used very intensively to load, process and save data.

These tests used to run in a certain amount of time, but started taking about 25% longer after the upgrade to Quino 2.

Measuring Execution Speed

Before doing anything else—making educated guesses as to what the problem could be, for example—we measure. At Encodo, we use JetBrains DotTrace to collect performance profiles.

There is no hidden secret: the standard procedure is to take a measurement before and after the change and to compare them. However, so much had changed from Quino 1.13 to Quino 2—e.g. namespaces and type names had changed—that while DotTrace was able to show some matches, the comparisons were not as useful as usual.

A comparison between codebases that hadn’t changed so much is much easier, but I didn’t have that luxury.

Tracking the Problem

Even excluding the less-than-optimal comparison, it was an odd profile. Ordinarily, one or two issues stick out right away, but the slowness seemed to suffuse the entire test run. Since the direct profiling comparison was difficult, I downloaded test-speed measurements as CSV from TeamCity for the product where we noticed the issue.

How much slower, you might ask? The test that I looked at most closely took almost 4 minutes (236,187ms) in the stable version, but took 5:41 in the latest build.

This test was definitely one of the largest and longest tests, so it was particularly impacted. Most other tests that imported and manipulated data ranged anywhere from 10% to 30% slower.

When I looked for hot-spots, the profile unsurprisingly showed me that database access took up the most time. The issue was more subtle: while database-access still used the most time, it was using a smaller percentage of the total time. Hot-spot analysis wasn’t going to help this time. Sorting by absolute times and using call counts in the tracing profiles yielded better clues.

The tests were slower when saving and also when loading data. But I knew that the ORM code itself had barely changed at all. And, since the product was using Quino so heavily, the stack traces ran quite deep. After a lot of digging, I noticed that creating the ExpressionContext to hold an object while evaluating expressions locally seemed to be taking longer than before. This was my first, real clue.

Once I was on the trail, I found that when evaluating calls (getting objects) that used local evaluation, it was also always slower.

Don’t Get Distracted

Once you start looking for places where performance is not optimal, you’re likely to start seeing them everywhere. However, as noted above, 97% of them are harmless.

To be clear, we’re not optimizing because we feel that the framework is too slow but because we’ve determined that the framework is now slower than it used to be and we don’t know why.

Even after we’ve finished restoring the previous performance (or maybe even making it a little better), we might still be able to easily optimize further, based on other information that we gleaned during our investigation.

But we want to make sure that we don’t get distracted and start trying to FIX ALL THE THINGS instead of just focusing on one task at a time. While it’s somewhat disturbing that we seem to be created 20 million ExpressionContext objects in a 4-minute test, that is also how we’ve always done it, and no-one has complained about the speed up until now.

Sure, if we could reduce that number to only 2 million, we might be even faster [3], but the point is that that we used to be faster on the exact same number of calls—so fix that first.

A Likely Culprit: Scope

I found a likely candidate in the Scope class, which implements the IScope interface. This type is used throughout Quino, but the two use-cases that affect performance are:

As a base for the ExpressionContext, which holds the named values and objects to be used when evaluating the value of an IExpression. These expressions are used everywhere in the data driver.
As a base for the poor-man’s IOC used in Stage 2 of application execution. [4]

The former usage has existed unchanged for years; its implementation is unlikely to be the cause of the slowdown. The latter usage is new and I recall having made a change to the semantics of which objects are returned by the Scope in order to make it work there as well.

How could this happen?

You may already be thinking: smooth move, moron. You changed the behavior of a class that is used everywhere for a tacked-on use case. That’s definitely a valid accusation to make.

In my defense, my instinct is to reuse code wherever possible. If I already have a class that holds a list of objects and gives me back the object that matches a requested type, then I will use that. If I discover that the object that I get back isn’t as predictable as I’d like, then I improve the predictability of the API until I’ve got what I want. If the improvement comes at no extra cost, then it’s a win-win situation. However, this time I paid for the extra functionality with degraded performance.

Where I really went wrong was that I’d made two assumptions:

I assumed that all other usages were also interested in improved predictability.
I assumed that all other usages were not performance-critical. When I wrote the code you’ll see below, I distinctly remember thinking: it’s not fast, but it’ll do and I’ll make it faster if it becomes a problem. Little did I know how difficult it would be to find the problem.

Preventing future slippage

Avoid changing a type shared by different systems without considering all stakeholder requirements.

I think a few words on process here are important. Can we improve the development process so that this doesn’t happen again? One obvious answer would be to avoid changing a type shared by different systems without considering all stakeholder requirements. That’s a pretty tall order, though. Including this in the process will most likely lead to less refactoring and improvement out of fear of breaking something.

We discussed above how completely reasonable assumptions and design decisions led to the performance degradation. So we can’t be sure it won’t happen again. What we would like, though, is to be notified quickly when there is performance degradation, so that it appears as a test failure.

Notify quickly when there is performance degradation

Our requirements are captured by tests. If all of the tests pass, then the requirements are satisfied. Performance is a non-functional requirement. Where we could improve Quino is to include high-level performance tests that would sound the alarm the next time something like this happens. [5]

Enough theory: in part II, we’ll describe the problem in detail and take a crack at improving the speed. See you there.

[1] In fairness, the quote is at least properly attributed. It really was Donald Knuth who wrote it.

[2] By “opportunity”, of course, I mean that I messed something up that made Quino slower in the new version.

[3] See the article Quino 2: Starting up an application, in detail for more information on this usage.

[4] I’m working on this right now, in issue Add standard performance tests for release 2.1.

Quino v2.1: API-smoothing and performance

2016-01-01T22:52:49+01:00

Published by marco on 1. Jan 2016 22:52:49 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

Quino 2 is finally ready and will go out the door with a 2.1 rather than a 2.0 version number. The reason being that we released 2.0 internally and tested the hell out of it. 2.1 is the result of that testing. It includes a lot of bug fixes as well as API tweaks to make things easier for developers.

Made more improvements to the Startup/Configuration API for products (QNO-4860, QNO-4991, QNO-4990, QNO-4911)
Further refined the independent and properly decoupled assemblies (QNO-5001, QNO-4609, QNO-2514, QNO-4980, QNO-4974, QNO-4865)
Improved support for continuous integration and deployment of Quino packages and symbols via TeamCity (QNO-5003, QNO-3738, QNO-4998, QNO-4756, QNO-4995)
Improved the Web integration with ASP.NET WebAPI and MVC authentication filters (QNO-4711)
Addressed some performance regressions from 1.13 and added a suite of performance tests to keep better track of performance targets.

On top of that, I’ve gone through the backlog and found many issues that had either been fixed already, were obsolete or had been inadequately specified. The Quino backlog dropped from 682 to 542 issues.

18 issues marked as won’t fix and 46 issues marked as obsolete
- Stop supporting Glimpse, although there is a Quino.Web.Glimpse package to use the support we do have (QNO-4560)
- Stop supporting HtmlHelpers and other client-side rendering (QNO-3921, QNO-3995, QNO-3804, QNO-3797, QNO-3974, QNO-4001, QNO-3992, QNO-3991, QNO-3973, QNO-3970, QNO-3969, QNO-3918, QNO-3866, QNO-3865, QNO-3857, QNO-3849, QNO-3848, QNO-3842, QNO-3839, QNO-3837, QNO-3836, QNO-3834, QNO-3833, QNO-3831, QNO-3824 w/sub-tasks, QNO-3806, QNO-3805, QNO-3802, QNO-2288)
12 issues marked as incomplete (not sufficiently specified)
2 issues marked as cannot reproduce
97 issues marked as fixed

Breaking changes

The following changes are marked with Obsolete attributes, so you’ll get a hint as to how to fix the problem. Since these are changes from an unreleased version of Quino, they cause a compile error.

UseMetaSchemaWinformDxFeedback() has been renamed to UseMetaschemaWinformDx()
UseSchemaMigrationSupport() has been renamed to UseIntegratedSchemaMigration()
MetaHttpApplicationBase.MetaApplication has been renamed to BaseApplication
The IServer.Run() extension method is no longer supported.
GetStandardFilters, GetStandardFiltersForFormsAuthentication() and GetStandardFiltersForUnrestrictedAuthentication are no longer supported. Instead, you should register filters in the IOC and use the IWebFilterAttributeFactory.CreateFilters() to get the list of supported filters
The ToolRequirementAttribute is no longer supported or used.
AssemblyExtensions.GetLoadableTypesWithInterface() is no longer supported
AssemblyTools.GetValidAssembly() has been replaced with AssemblyTools.GetApplicationAssembly(); GetExecutableName() and GetExecutablePath() have removed.
All of the constant expressions on the MetaBuilderBase (e.g. EndOfTimeExpression) are obsolete. Instead, use MetaBuilderBase.ExpressionFactory.Constants.EndOfTime instead.
All of the global values on MetaObjectDescriptionExtensions are obsolete; instead, use the IMetaObjectFormatterSettings from the IOC to change settings on startup.
Similarly, the set of extension methods that included GetShortDescription() has been moved to the IMetaObjectFormatter. Obtain an instance from the IOC, as usual.

Quino v2.0: Logging, Dependencies, New Assemblies & Nuget

2015-12-28T10:40:24+01:00

Published by marco on 28. Dec 2015 10:40:24 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

In the beta1 and beta2 release notes, we read about changes to configuration, dependency reduction, the data driver architecture, DDL commands, security and access control in web applications and a new code-generation format.

In 2.0 final—which was actually released internally on November 13th, 2015 (a Friday)—we made the following additional improvements:

Moved the metadata table maintained for the schema-migrator to a proper Quino module. (QNO-4741)
Rebuilt the logging and messaging API and drastically simplified the implementation throughout (QNO-4688 w/sub-tasks, QNO-4954)
Split Encodo and Quino into dozens of new, independent and properly decoupled assemblies (QNO-4678, QNO-4672, QNO-4670, QNO-4376, QNO-4920, QNO-4926)
Rebuilt the configuration and application-startup API (QNO-4855, QNO-4051, QNO-4895, QNO-4931, QNO-4930, QNO-4949, QNO-4659 w/sub-tasks, QNO-4950, QNO-4857, QNO-4910, QNO-4934, QNO-4898, QNO-4935, QNO-4937)
Changed delivery and deployment for Quino and all products to Nuget packages (QNO-4916)
Added scripting and support for continuous integration and deployment of Quino packages and symbols via TeamCity (QNO-4871, QNO-4932, QNO-3437, QNO-4433, QNO-4494, QNO-4871)
Restructured and refactored the standard testing base-classes and testing support for Quino products (QNO-4963)
Improved and fixed code-generation for both v1 and v2 formats (QNO-4804, QNO-4828, QNO-4897)
Refactored the application server API to improve decoupling and extensibility (QNO-4927)

These notes are being published for completeness and documentation. The first publicly available release of Quino 2.x will be 2.1 or higher (release notes coming soon).

Breaking changes

A big project will have a lot of errors (over 12,000!)As we’ve mentioned before, this release is absolutely merciless in regard to backwards compatibility. Old code is not retained as Obsolete. Instead, a project upgrading to 2.0 will encounter compile errors.

The following notes serve as an incomplete guide that will help you upgrade a Quino-based product.

As I wrote in the release notes for beta1 and beta2, if you arm yourself with a bit of time, ReSharper and the release notes (and possibly keep an Encodo employee on speed-dial), the upgrade is not difficult. It consists mainly of letting ReSharper update namespace references for you.

Global Search/Replace

Instead of going through the errors (example shown to the right) one by one, you can take care of a lot of errors with the following search/replace pairs.

Encodo.Quino.Data.Persistence => Encodo.Quino.Data
IMetaApplication => IApplication
ICoreApplication => IApplication
GetServiceLocator() => GetServices()
MetaMethodTools.GetInstance => DataMetaMethodExtensions.GetInstance
application.ServiceLocator.GetInstance => application.GetInstance
Application.ServiceLocator.GetInstance => Application.GetInstance
application.ServiceLocator => application.GetServices()
Application.ServiceLocator => Application.GetServices()
application.Recorder => application.GetLogger()
Application.Recorder => Application.GetLogger()
session.GetRecorder() => session.GetLogger()
Session.GetRecorder() => Session.GetLogger()
Session.Application.Recorder => Session.GetLogger()
FileTools.Canonicalize() => PathTools.Normalize()
application.Messages => application.GetMessageList()
Application.Messages => Application.GetMessageList()
ServiceLocator.GetInstance => Application.GetInstance
MetaLayoutTools => LayoutConstants
GlobalContext.Instance.Application.Configuration.Model => GlobalContext.Instance.Application.GetModel()
IMessageRecorder => ILogger
GetUseReleaseSettings() => IsInReleaseMode()
ReportToolsDX => ReportDxExtensions

Although you can’t just search/replace everything, it gets you a long way.

Model-Building Fixes

These replacement pairs, while not recommended for global search/replace, are a handy guide for how the API has generally changed.

*Generator => *Builder
SetUpForModule => CreateModule
Builder.SetElementVisibility(prop, true) => prop.Show()
Builder.SetElementVisibility(prop, false) => prop.Hide()
Builder.SetElementControlIdentifier(prop, ControlIdentifiers => prop.SetInputControl(ControlIdentifiers
Builder.SetPropertyHeightInPixels(prop, 200); => prop.SetHeightInPixels(200);

Constructing a module has also changed. Instead of using the following syntax,

var module = Builder.SetUpForModule(Name, "ApexClearing.Alps.Core", Name, true);

Replace it with the following direct replacement,

var module = Builder.CreateModule(Name, "ApexClearing.Alps.Core", Name);

Or use this replacement, with the recommended style for the v2 format (no more class prefix for generated classes and a standard namespace):

var module = Builder.CreateModule(Name, typeof(AuditModuleBuilder).GetParentNamespace());

Standard Modules (e.g. Reporting, Security, etc.)

Because of how the module class-names have changed, the standard module ORM classes all have different names. The formula is that the ORM class-name is no longer prepended its module name.

ReportsReportDefinition => ReportDefinition
SecurityUser => User
And so on…

Furthermore, all modules have been converted to use the v2 code-generation format, which has the metadata separate from the ORM object. Therefore, instead of referencing metadata using the ORM class-name as the base, you use the module name as the base.

ReportReportDefinition.Fields.Name => ReportModule.ReportDefinition.Name.Identifier
ReportReportDefinition.MetaProperties.Name => ReportModule.ReportDefinition.Name
ReportReportDefinition.Metadata => ReportModule.ReportDefinition.Metadata
And so on…

There’s an upcoming article that will show more examples of the improved flexibility and capabilities that come with the v2-metadata.

Action names

The standard action names have moved as well.

ActionNames => ApplicationActionNames
MetaActionNames => MetaApplicationActionNames

Any other, more rarely used action names have been moved back to the actions themselves, so for example

SaveApplicationSettingsAction.ActionName

If you created any actions of your own, then the API there has changed as well. As previously documented in API Design: To Generic or not Generic? (Part II), instead of overriding the following method,

protected override int DoExecute(IApplication application, ConfigurationOptions options, int currentResult)
{
  base.DoExecute(application, options, currentResult);
}

you instead override in the following way,

public override void Execute()
{
  base.Execute();
}

Using NuGet

If you’re already using Visual Studio 2015, then the NuGet UI is a good choice for managing packages. If you’re still on Visual Studio 2013, then the UI there is pretty flaky and we recommend using the console.

The examples below assume that you have configured a source called “Local Quino” (e.g. a local folder that holds the nupkg files for Quino).

install-package Quino.Data.PostgreSql.Testing -ProjectName Punchclock.Core.Tests -Source "Local Quino"
install-package Quino.Server -ProjectName Punchclock.Server -Source "Local Quino"
install-package Quino.Console -ProjectName Punchclock.Server -Source "Local Quino"
install-package Quino.Web -ProjectName Punchclock.Web.API -Source "Local Quino"

Debugging Support

We recommend using Visual Studio 2015 if at all possible. Visual Studio 2013 is also supported, but we have all migrated to 2015 and our knowhow about 2013 and its debugging idiosyncrasies will deteriorate with time.

These are just brief points of interest to get you set up. As with the NuGet support, these instructions are subject to change as we gain more experience with debugging with packages as well.

Hook up to a working symbol-source server (e.g. TeamCity)
Get the local sources for your version
If you don’t have a source server or it’s flaky, then get the PDBs for the Quino version you’re using (provided in Quino.zip as part of the package release)
Add the path to the PDBs to your list of symbol sources in the VS debugging options
Tell Visual Studio where the sources are when it asks during debugging
Tell R# how to map from the source folder (c:\BuildAgent\work\9a1bb0adebb73b1f for Quino 2.0.0-1765) to the location of your sources

Quino packages are no different than any other NuGet packages. We provide both standard packages as well as packages with symbols and sources. Any complications you encounter with them are due to the whole NuGet experience still being a bit in-flux in the .NET world.

An upcoming post will provide more detail and examples.

Creating Nuget Packages

We generally use our continuous integration server to create packages, but you can also create packages locally (it’s up to you to make sure the version number makes sense, so be careful). These instructions are approximate and are subject to change. I provide them here to give you an idea of how packages are created. If they don’t work, please contact Encodo for help.

Open PowerShell
Change to the %QUINO_ROOT%\src directory
Run nant build pack to build Quino and packages
Set up a local NuGet Source name “Local Quino” to %QUINO_ROOT%\nuget (one-time only)
Change to the directory where your Quino packages are installed for your solution.
Delete all of the Encodo/Quino packages
Execute nant nuget from your project directory to get the latest Quino build from your local folder

Improving NUnit integration with testing harnesses

2015-12-06T11:57:57+01:00

Published by marco on 6. Dec 2015 11:57:57 (GMT-5)

These days nobody who’s anybody in the software-development world is writing software without tests. Just writing them doesn’t help make the software better, though. You also need to be able to execute tests—reliably and quickly and repeatably.

That said, you’ll have to get yourself a test runner, which is a different tool from the compiler or the runtime. That is, just because your tests compile (satisfy all of the language rules) and could be executed doesn’t mean that you’re done writing them yet.

Testing framework requirements

Every testing framework has its own rules for how the test runner selects methods for execution as tests. The standard configuration options are:

Which classes should be considered as test fixtures?
Which methods are considered tests?
Where do parameters for these methods come from?
Is there startup/teardown code to execute for the test or fixture?

Each testing framework will offer different ways of configuring your code so that the test runner can find and execute setup/test/teardown code. To write NUnit tests, you decorate classes, methods and parameters with C# attributes.

The standard scenario is relatively easy to execute—run all methods with a Test attribute in a class with a TestFixture attribute on it.

Test-runner Requirements

There are legitimate questions for which even the best specification does not provide answers.

When you consider multiple base classes and generic type arguments, each of which may also have NUnit attributes, things get a bit less clear. In that case, not only do you have to know what NUnit offers as possibilities but also whether the test runner that you’re using also understands and implements the NUnit specification in the same way. Not only that, but there are legitimate questions for which even the best specification does not provide answers.

At Encodo, we use Visual Studio 2015 with ReSharper 9.2 and we use the ReSharper test runner. We’re still looking into using the built-in VS test runner—the continuous-testing integration in the editor is intriguing [1]—but it’s quite weak when compared to the ReSharper one.

So, not only do we have to consider what the NUnit documentation says is possible, but we must also know what how the R# test runner interprets the NUnit attributes and what is supported.

Getting More Complicated

Where is there room for misunderstanding? A few examples,

What if there’s a TestFixture attribute on an abstract class?
How about a TestFixture attribute on a class with generic parameters?
Ok, how about a non-abstract class with Tests but no TestFixture attribute?
And, finally, a non-abstract class with Tests but no TestFixture attribute, but there are non-abstract descendants that do have a TestFixture attribute?

In our case, the answer to these questions depends on which version of R# you’re using. Even though it feels like you configured everything correctly and it logically should work, the test runner sometimes disagrees.

Sometimes it shows your tests as expected, but refuses to run them (Inconclusive FTW!)
Or other times, it obstinately includes generic base classes that cannot be instantiated into the session, then complains that you didn’t execute them. When you try to delete them, it brings them right back on the next build. When you try to run them—perhaps not noticing that it’s those damned base classes—then it complains that it can’t instantiate them. Look of disapproval.

Throw the TeamCity test runner into the mix—which is ostensibly the same as that from R# but still subtly different—and you’ll have even more fun.

Improving Integration with the R# Test Runner

At any rate, now that you know the general issue, I’d like to share how the ground rules we’ve come up with that avoid all of the issues described above. The text below comes from the issue I created for the impending release of Quino 2.

Environment

Windows 8.1 Enterprise
Visual Studio 2015
ReSharper 9.2

Expected behavior

Non-leaf-node base classes should never appear as nodes in test runners. A user should be able to run tests in descendants directly from a fixture or test in the base class.

Observed behavior

Non-leaf-node base classes are shown in the R# test runner in both versions 9 and 10. A user must navigate to the descendant to run a test. The user can no longer run all descendants or a single descendant directly from the test.

Analysis

Relatively recently, in order to better test a misbehaving test runner and accurately report issues to JetBrains, I standardized all tests to the same pattern:

Do not use abstract anywhere (the base classes don’t technically need it)
Use the TestFixture attribute only on leaf nodes

This worked just fine with ReSharper 8.x but causes strange behavior in both R# 9.x and 10.x. We discovered recently that not only did the test runner act strangely (something that they might fix), but also that the unit-testing integration in the files themselves behaved differently when the base class is abstract (something JetBrains is unlikely to fix).

You can see that R# treats a non-abstract class with tests as a testable entity, even when it doesn’t actually have a TestFixture attribute and even expects a generic type parameter in order to instantiate.

Here it’s not working well in either the source file or the test runner. In the source file, you can see that it offers to run tests in a category, but not the tests from actual descendants. If you try to run or debug anything from this menu, it shows the fixture with a question-mark icon and marks any tests it manages to display as inconclusive. This is not surprising, since the test fixture may not be abstract, but does require a type parameter in order to be instantiated.

Non-abstract base class

Here it looks and acts correctly:

A test in an abstract test fixture

I’ve reported this issue to JetBrains, but our testing structure either isn’t very common or it hasn’t made it to their core test cases, because neither 9 nor 10 handles them as well as the 8.x runner did.

Now that we’re also using TeamCity a lot more to not only execute tests but also to collect coverage results, we’ll capitulate and just change our patterns to whatever makes R#/TeamCity the happiest.

Solution

Make all testing base classes that include at least one {{Test}} or {{Category}} attribute {{abstract}}. Base classes that do not have any testing attributes do not need to be made abstract.

Once more to recap our ground rules for making tests:

Include TestFixture only on leafs (classes with no descendants)
You can put Category or Test attributes anywhere in the hierarchy, but need to declare the class as abstract.
Base classes that have no testing attributes do not need to be abstract
If you feel you need to execute tests in both a base class and one of its descendants, then you’re probably doing something wrong. Make two descendants of the base class instead.

When you make the change, you can see the improvement immediately.

After making the base class abstract

[1] ReSharper 10.0 also offers continuous integration, but our experiments with the EAP builds and the first RTM build left us underwhelmed and we downgraded to 9.2 until JetBrains manages to release a stable 10.x.

Quino 2: Starting up an application, in detail

2015-11-28T13:58:45+01:00

Published by marco on 28. Nov 2015 13:58:45 (GMT-5)

Updated by marco on 19. May 2017 15:18:20 (GMT-5)

As part of the final release process for Quino 2, we’ve upgraded 5 solutions [1] from Quino 1.13 to the latest API in order to shake out any remaining API inconsistencies or even just inelegant or clumsy calls or constructs. A lot of questions came up during these conversions, so I wrote the following blog to provide detail on the exact workings and execution order of a Quino application.

I’ve discussed the design of Quino’s configuration before, most recently in API Design: Running an Application (Part I) and API Design: To Generic or not Generic? (Part II) as well as the three-part series that starts with Encodo’s configuration library for Quino: part I.

Quino Execution Stages

The life-cycle of a Quino 2.0 application breaks down into roughly the following stages:

Build Application: Register services with the IOC, add objects needed during configuration and add actions to the startup and shutdown lists
Load User Configuration: Use non-IOC objects to bootstrap configuration from the command line and configuration files; IOC is initialized and can no longer be modified after action ServicesInitialized
Apply Application Configuration: Apply code-based configuration to IOC objects; ends with the ServicesConfigured action
Execute: execute the loop, event-handler, etc.
Shut Down: dispose of the application, shutting down services in the IOC, setting the exit code, etc.

Stage 1

The first stage is all about putting the application together with calls to Use various services and features. This stage is covered in detail in three parts, starting with Encodo’s configuration library for Quino: part I.

Stage 2

Let’s tackle this one last because it requires a bit more explanation.

Stage 3

Technically, an application can add code to this stage by adding an IApplicationAction before the ServicesConfigured action. Use the Configure() extension method in stage 1 to configure individual services, as shown below.

application.Configure(
  s => s.Behavior = FileLogBehavior.MultipleFiles
);

Stage 4

The execution stage is application-specific. This stage can be short or long, depending on what your application does.

For desktop applications or single-user utilities, stage 4 is executed in application code, as shown below, in the Run method, which called by the ApplicationManager after the application has started.

var transcript = new ApplicationManager().Run(CreateApplication, Run);

IApplication CreateApplication() { … }
void Run(IApplication application) { … }

If your application is a service, like a daemon or a web server or whatever, then you’ll want to execute stages 1–3 and then let the framework send requests to your application’s running services. When the framework sends the termination signal, execute stage 5 by disposing of the application. Instead of calling Run, you’ll call CreateAndStartupUp.

var application = new ApplicationManager().CreateAndStartUp(CreateApplication);

IApplication CreateApplication() { … }

Stage 5

Every application has certain tasks to execute during shutdown. For example, an application will want to close down any open connections to external resources, close file (especially log files) and perhaps inform the user of shutdown.

Instead of exposing a specific “shutdown” method, a Quino 2.0 application can simply be disposed to shut it down.

If you use ApplicationManager.Run() as shown above, then you’re already sorted—the application will be disposed and the user will be informed in case of catastrophic failure; otherwise, you can shut down and get the final application transcript from the disposed object.

application.Dispose();
var transcript = application.GetTranscript();
// Do something with the transcript…

Stage 2 Redux

We’re finally ready to discuss stage 2 in detail.

An IOC has two phases: in the first phase, the application registers services with the IOC; in the second phase, the application uses services from the IOC.

An application should use the IOC as much as possible, so Quino keeps stage 2 as short as possible. Because it can’t use the IOC during the registration phase, code that runs in this stage shares objects via a poor-man’s IOC built into the IApplication that allows modification and only supports singletons. Luckily, very little end-developer application code will ever need to run in this stage. It’s nevertheless interesting to know how it works.

Obviously, any code in this stage that uses the IOC will cause it to switch from phase one to phase two and subsequent attempts to register services will fail. Therefore, while application code in stage 2 has to be careful, you don’t have to worry about not knowing you’ve screwed up.

Why would we have this stage? Some advocates of using an IOC claim that everything should be configured in code. However, it’s not uncommon for applications to want to run very differently based on command-line or other configuration parameters. The Quino startup handles this by placing the following actions in stage 2:

Parse and apply command-line
Import and apply external configuration (e.g. from file)

An application is free to insert more actions before the ServicesInitialized action, but they have to play by the rules outlined above.

“Single” objects

Code in stage 2 shares objects by calling SetSingle() and GetSingle(). There are only a few objects that fall into this category.

The calls UseCore() and UseApplication() register most of the standard objects used in stage 2. Actually, while they’re mostly used during stage 2, some of them are also added to the poor man’s IOC in case of catastrophic failure, in which case the IOC cannot be assumed to be available. A good example is the IApplicationCrashReporter.

Executing Stages

Before listing all of the objects, let’s take a rough look at how a standard application is started. The following steps outline what we consider to be a good minimum level of support for any application. Of course, the Quino configuration is modular, so you can take as much or as little as you like, but while you can use a naked Application—which has absolutely nothing registered—and you can call UseCore() to have a bit more—it registers a handful of low-level services but no actions—we recommend calling at least UseApplication() to adds most of the functionality outlined below.

Create application: This involves creating the IOC and most of the IOC registration as well as adding most of the application startup actions (stage 1)
Set debug mode: Get the final value of RunMode from the IRunSettings to determine if the application should catch all exceptions or let them go to the debugger. This involves getting the IRunSettings from the application and getting the final value using the IApplicationManagerPreRunFinalizer. This is commonly an implementation that can allows setting the value of RunMode from the command-line in debug builds. This further depends on the ICommandSetManager (which depends on the IValueTools) and possibly the ICommandLineSettings (to set the CommandLineConfigurationFilename if it was set by the user).
Process command line: Set the ICommandProcessingResult, possibly setting other values and adding other configuration steps to the list of startup actions (e.g. many command-line options are switches that are handled by calling Configure() where TSettings is the configuration object in the IOC to modify).
Read configuration file: Load the configuration data into the IConfigurationDataSettings, involving the ILocationManager to find configuration files and the ITextValueNodeReader to read them.
The ILogger is used throughout by various actions to log application behavior
If there is an unhandled error, the IApplicationCrashReporter uses the IFeedback or the ILogger to notify the user and log the error
The IInMemoryLogger is used to include all in-memory messages in the IApplicationTranscript

The next section provides detail to each of the individual objects referenced in the workflow above.

Available Objects

You can get any one of these objects from the IApplication in at least two ways, either by using GetSingle() (safe in all situations) or GetInstance() (safe only in stage 3 or later) or there’s almost always a method which starts with “Use” and ends in the service name.

The example below shows how to get the ICommandSetManager [2] if you need it.

application.GetCommandSetManager();
application.GetSingle(); // Prefer the one above
application.GetInstance();

All three calls return the exact same object, though. The first two from the poor-man’s IOC; the last from the real IOC.

Only applications that need access to low-level objects or need to mess around in stage 2 need to know which objects are available where and when. Most applications don’t care and will just always use GetInstance().

The objects in the poor-man’s IOC are listed below.

Core

IValueTools: converts values; used by the command-line parser, mostly to translate enumerate values and flags
ILocationManager: an object that manages aliases for file-system locations, like “Configuration”, from which configuration files should be loaded or “UserConfiguration” where user-specific overlay configuration files are stored; used by the configuration loader
ILogger: a reference to the main logger for the application
IInMemoryLogger: a reference to an in-memory message store for the logger (used by the ApplicationManager to retrieve the message log from a crashed application)
IMessageFormatter: a reference to the object that formats messages for the logger

Command line

ICommandSetManager: sets the schema for a command line; used by the command-line parser
ICommandProcessingResult: contains the result of having processed the command line
ICommandLineSettings: defines the properties needed to process the command line (e.g. the Arguments and CommandLineConfigurationFilename, which indicates the optional filename to use for configuration in addition to the standard ones)

Configuration

IConfigurationDataSettings: defines the ConfigurationData which is the hierarchical representation of all configuration data for the application as well as the MainConfigurationFilename from which this data is read; used by the configuration-loader
ITextValueNodeReader: the object that knows how to read ConfigurationData from the file formats supported by the application [3]; used by the configuration-loader

Run

IRunSettings: an object that manages the RunMode (“release” or “debug”), which can be set from the command line and is used by the ApplicationManager to determine whether to use global exception-handling
IApplicationManagerPreRunFinalizer: a reference to an object that applies any options from the command line before the decision of whether to execute in release or debug mode is taken.
IApplicationCrashReporter: used by the ApplicationManager in the code surrounding the entire application execution and therefore not guaranteed to have a usable IOC available
IApplicationDescription: used together with the ILocationManager to set application-specific aliases to user-configuration folders (e.g. AppData\{CompanyTitle}\{ApplicationTitle})
IApplicationTranscript: an object that records the last result of having run the application; returned by the ApplicationManager after Run() has completed, but also available through the application object returned by CreateAndStartUp() to indicate the state of the application after startup.

Each of these objects has a very compact interface and has a single responsibility. An application can easily replace any of these objects by calling UseSingle() during stage 1 or 2. This call sets the object in both the poor-man’s IOC as well as the real one. For those rare cases where a non-IOC singleton needs to be set after the IOC has been finalized, the application can call SetSingle(), which does not touch the IOC. This feature is currently used only to set the IApplicationTranscript, which needs to happen even after the IOC registration is complete.

[1]

Two large customer solutions, two medium-sized internal solutions (Punchclock and JobVortex) as well as the Demo/Sandbox solution. These solutions include the gamut of application types:

3 ASP.NET MVC applications
2 ASP.NET WebAPI applications
2 Windows services
3 Winform/DevExpress applications
2 Winform/DevExpress utilities
4 Console applications and utilities

[2]

I originally used ITextValueNodeReader as an example, but that’s one case where the recommended call doesn’t match 1-to-1 with the interface name.

application.GetSingle();
application.GetInstance();
application.GetConfigurationDataReader(); // Recommended

[3] Currently only XML, but JSON is on the way when someone gets a free afternoon.

IServer: converting hierarchy to composition

2015-11-23T22:31:29+01:00

Published by marco on 23. Nov 2015 22:31:29 (GMT-5)

Quino has long included support for connecting to an application server instead of connecting directly to databases or other sources. The application server uses the same model as the client and provides modeled services (application-specific) as well as CRUD for non-modeled data interactions.

We wrote the first version of the server in 2008. Since then, it’s acquired better authentication and authorization capabilities as well as routing and state-handling. We’ve always based it on the .NET HttpListener.

Old and Busted

As late as Quino 2.0-beta2 (which we had deployed in production environments already), the server hierarchy looked like screenshot below, pulled from issue QNO-4927:

Server class/interface hierarchy

This screenshot was captured after a few unneeded interfaces had already been removed. As you can see by the class names, we’d struggled heroically to deal with the complexity that arises when you use inheritance rather than composition.

The state-handling was welded onto an authentication-enabled server, and the base machinery for supporting authentication was spread across three hierarchy layers. The hierarchy only hints at composition in its naming: the “Stateful” part of the class name CoreStatefulHttpServerBase had already been moved to a state provider and a state creator in previous versions. That support is unchanged in the 2.0 version.

Implementation Layers

We mentioned above that implementation was “spread across three hierarchy layers”. There’s nothing wrong with that, in principle. In fact, it’s a good idea to encapsulate higher-level patterns in a layer that doesn’t introduce too many dependencies and to introduce dependencies in other layers. This allows applications not only to be able to use a common implementation without pulling in unwanted dependencies, but also to profit from the common tests that ensure the components works as advertised.

In Quino, the following three layers are present in many components:

Abstract: a basic encapsulation of a pattern with almost no dependencies (generally just Encodo.Core).
Standard: a functional implementation of the abstract pattern with dependencies on non-metadata assemblies (e.g. Encodo.Application, Encodo.Connections and so on)
Quino: an enhancement of the standard implementation that makes use of metadata to fill in implementation left abstract in the previous layer. Dependencies can include any of the Quino framework assemblies (e.g. Quino.Meta, Quino.Application and so on).

The New Hotness [1]

The diagram below shows the new hotness in Quino 2. [2]

Quino 2.0 Server Infrastructure

The hierarchy is now extremely flat. There is an IServer interface and a Server implementation, both generic in TListener, of type IServerListener. The server manages a single instance of an IServerListener.

The listener, in turn, has an IHttpServerRequestHandler, the main implementation of which uses an IHttpServerAuthenticator.

As mentioned above, the IServerStateProvider is included in this diagram, but is unchanged from Quino 2.0-beta3, except that it is now used by the request handler rather than directly by the server.

You can see how the abstract layer is enhanced by an HTTP-specific layer (the Encodo.Server.Http namespace) and the metadata-specific layer is nice encapsulated in three classes in the Quino.Server assembly.

Server Components and Flow

This type hierarchy has decoupled the main elements of the workflow of handling requests for a server:

The server manages listeners (currently a single listener), created by a listener factory
The listener, in turn, dispatches requests to the request handler
The request handler uses the route handler to figure out where to direct the request
The route handler uses a registry to map requests to response items
The request handler asks the state provider for the state for the given request
The state provider checks its cache for the state (the default support uses persistent states to cache sessions for a limited time); if not found, it creates a new one
Finally, the request handler checks whether the user for the request is authenticated and/or authorized to execute the action and, if so, executes the response items

It is important to note that this behavior is unchanged from the previous version—it’s just that now each step is encapsulated in its own component. The components are small and easily replaced, with clear and concise interfaces.

Note also that the current implementation of the request handler is for HTTP servers only. Should the need arise, however, it would be relatively easy to abstract away the HttpListener dependency and generalize most of the logic in the request handler for any kind of server, regardless of protocol and networking implementation. Only the request handler is affected by the HTTP dependency, though: authentication, state-provision and listener-management can all be re-used as-is.

Also of note is that the only full-fledged implementation is for metadata-based applications. At the bottom of the diagram, you can see the metadata-specific implementations for the route registry, state provider and authenticator. This is reflected in the standard registration in the IOC.

These are the service registrations from Encodo.Server:

return handler
  .RegisterSingle()
  .RegisterSingle, HttpServerListenerFactory>()
  .Register>();

And these are the service registrations from Quino.Server:

handler
  .RegisterSingle, StandardMetaServerRouteRegistry>()
  .RegisterSingle, MetaPersistentServerStateProvider>()
  .RegisterSingle, MetaServerStateCreator>()
  .RegisterSingle, MetaHttpServerAuthenticator>()
  .RegisterSingle>()

As you can see, the registration is extremely fine-grained and allows very precise customization as well as easy mocking and testing.

[1] Any Men in Black fans out there? Tommy Lee Jones was “old and busted” while Will Smith was “the new hotness”? No? Just me? All righty then…

[2] This diagram brought to you by the diagramming and architecture tools in ReSharper 9.2. Just select the files or assemblies you want to diagram in the Solution Explorer and choose the option to show them in a diagram. You can right-click any type or assembly to show dependent or referenced modules or types. For type diagrams , you can easily control which relationships are to be shown (e.g. I hide aggregations to avoid clutter) and how the elements are to be grouped (e.g. I grouped by namespace to include the boxes in my diagram).

Iterating with NDepend to remove cyclic dependencies (Part II)

2015-10-16T11:44:35+02:00

Published by marco on 16. Oct 2015 11:44:35 (GMT-5)

In the previous article, we discussed the task of Splitting up assemblies in Quino using NDepend. In this article, I’ll discuss both the high-level and low-level workflows I used with NDepend to efficiently clear up these cycles.

Please note that what follows is a description of how I have used the tool—so far—to get my very specific tasks accomplished. If you’re looking to solve other problems or want to solve the same problems more efficiently, you should take a look at the official NDepend documentation.

What were we doing?

To recap briefly: we are reducing dependencies among top-level namespaces in two large assemblies, in order to be able to split them up into multiple assemblies. The resulting assemblies will have dependencies on each other, but the idea is to make at least some parts of the Encodo/Quino libraries opt-in.

The plan of attack

On a high-level, I tackled the task in the following loosely defined phases.

Remove direct, root-level dependencies: This is the big first step—to get rid of the little black boxes. I made NDepend show only direct dependencies at first, to reduce clutter. More on specific techniques below.
Remove indirect dependencies: Direct and Indirect references (the Black Hole)Crank up the magnification to show indirect dependencies as well. This will will help you root out the remaining cycles, which can be trickier if you’re not showing enough detail. On the contrary, if you turn on indirect dependencies too soon, you’ll be overwhelmed by darkness (see the depressing initial state of the Encodo assembly to the right).
Examine dependencies between root-level namespaces: Even once you’ve gotten rid of all cycles, you may still have unwanted dependencies that hinder splitting namespaces into the desired constellation of assemblies.

For example, the plan is to split all logging and message-recording into an assembly called Encodo.Logging. However, the IRecorder interface (with a single method, Log()) is used practically everywhere. It quickly becomes necessary to split interfaces and implementation—with many more potential dependencies—into two assemblies for some very central interfaces and support classes. In this specific case, I moved IRecorder to Encodo.Core.

Even after you’ve conquered the black hole, you might still have quite a bit of work to do. Never fear, though: NDepend is there to help root out those dependencies as well.
Examine cycles in non-root namespaces: Because we can split off smaller assemblies regardless, these dependencies are less important to clean up for our current purposes. However, once this code is packed into its own assembly, its namespaces become root namespaces of their own and—voila! you have more potentially nasty dependencies to deal with. Granted, the problem is less severe because you’re dealing with a logically smaller component.

In Quino, use non-root namespaces more for organization and less for defining components. Still, cycles are cycles and they’re worth examining and at least plucking the low-hanging fruit.

Removing root-level namespace cycles

With the high-level plan described above in hand, I repeated the following steps for the many dependencies I had to untangle. Don’t despair if it looks like your library has a ton of unwanted dependencies. If you’re smart about the ones you untangle first, you can make excellent—and, most importantly, rewarding—progress relatively quickly. [1]

Show the dependency matrix
Choose the same assembly in the row and column
Choose a square that’s black
Click the name of the namespace in the column to show sub-namespaces
Do the same in a row
Keep zooming until you can see where there are dependencies that you don’t want
Refactor/compile/run NDepend analysis to show changes
GOTO 1

Once again, with pictures!

The high-level plan of attack sounded interesting, but might have left you cold with its abstraction. Then there was the promise of detail with a focus on root-level namespaces, but alas, you might still be left wondering just how exactly do you reduce these much-hated cycles?

I took some screenshots as I worked on Quino, to document my process and point out parts of NDepend I thought were eminently helpful.

Show only namespaces

Show Namespaces involved in a DependencyReference cycles in Encodo (Namespaces only)I mentioned above that you should “[k]eep zooming in”, but how do you do that? A good first step is to zoom all the way out and show only direct namespace dependencies. This focuses only on using references instead of the much-more frequent member accesses. In addition, I changed the default setting to show dependencies in only one direction—when a column references a row (blue), but not vice versa (green).

As you can see, the diagrams are considerably less busy than the one shown above. Here, we can see a few black spots that indicate cycles, but it’s not so many as to be overwhelming. [2] You can hover over the offending squares to show more detail in a popup.

Show members

Reference cycles in Encodo (Members)Bind Matrix to Cluster Cycles TogetherIf you don’t see any more cycles between namespaces, switch the detail level to “Members”. Another very useful feature is to “Bind Matrix”, which forces the columns and rows to be shown in the same order and concentrates the cycles in a smaller area of the matrix.

As you can see in the diagram, NDepend then highlights the offending area and you can even click the upper-left corner to focus the matrix only on that particular cycle.

Drill down to classes

Show Classes involved in a DependencyUse the handy arrow dependency-indicatorsOnce you’re looking at members, it isn’t enough to know just the namespaces involved—you need to know which types are referencing which types. The powerful matrix view lets you drill down through namespaces to show classes as well.

If your classes are large—another no-no, but one thing at a time—then you can drill down to show which method is calling which method to create the cycle. In the screenshot to the right, you can see where I had to do just that in order to finally figure out what was going on.

In that screenshot, you can also see something that I only discovered after using the tool for a while: the direction of usage is indicated with an arrow. You can turn off the tooltips—which are informative, but can be distracting for this task—and you don’t have to remember which color (blue or green) corresponds to which direction of usage.

Indirect dependencies

The black hole is almost goneNo more black squares and a 5-element cycle leftOnce you’ve drilled your way down from namespaces-only to showing member dependencies, to focusing on classes, and even members, your diagram should be shaping up quite well.

On the right, you’ll see a diagram of all direct dependencies for the remaining area with a problem. You don’t see any black boxes, which means that all direct dependencies are gone. So we have to turn up the power of our microscope further to show indirect dependencies.

On the left, you can see that the scary, scary black hole from the start of our journey has been whittled down to a small, black spot. And that’s with all direct and indirect dependencies as well as both directions of usage turned on (i.e. the green boxes are back). This picture is much more pleasing, no?

Queries and graphs

NDepend Query LanguageView graph to see cycle from Culture to Enums (through Expression)Show indirect (zoom) in on enums & cultureFor the last cluster of indirect dependencies shown above, I had to unpack another feature: NDepend queries: you can select any element and run a query to show using/used by assemblies/namespaces. [3] The results are shown in a panel, where you can edit the query see live updates immediately.

Even with a highly zoomed-in view on the cycle, I still couldn’t see the problem, so I took NDepend’s suggestion and generated a graph of the final indirect dependency between Culture and Enums (through Expression). At this zoom level, the graph becomes more useful (for me) and illuminates problems that remain muddy in the matrix (see right).

Crossing the finish line

In order to finish the job efficiently, here are a handful of miscellaneous tips that are useful, but didn’t fit into the guide above.

Encodo assembly is finally clean

I set NDepend to automatically re-run an analysis on a successful build. The matrix updates automatically to reflect changes from the last analysis and won’t lose your place.
If you have ReSharper, you’ll generally be able to tell whether you’ve fixed the dependencies because the usings will be grayed out in the offending file. You can make several fixes at once before rebuilding and rerunning the analysis.
At higher zoom levels (e.g. having drilled down to methods), it is useful to toggle display of row dependencies back on because the dependency issue is only clear when you see the one green box in a sea of blue.
Though Matrix Binding is useful for localizing, remember to toggle it off when you want to drill down in the row independently of the namespace selected in the column.

And BOOM! just like that [4], phase 1 (root namespaces) for Encodo was complete! Now, on to Quino.dll…

Conclusion

Refactoring 82,087 Symbols…Depending on what shape your library is in, do not underestimate the work involved. Even with NDepend riding shotgun and barking out the course like a rally navigator, you still have to actually make the changes. That means lots of refactoring, lots of building, lots of analysis, lots of running tests and lots of reviews of at-times quite-sweeping changes to your code base. The destination is worth the journey, but do not embark on it lightly—and don’t forget to bring the right tools. [5]

[1] This can be a bit distracting: you might get struck trying to figure out which of all these offenders to fix first.

[2] I’m also happy to report that my initial forays into maintaining a relatively clean library—as opposed to cleaning it—with NDepend have been quite efficient.

[3] And much more: I don’t think I’ve even scratched the surface of the analysis and reporting capabilities offered by this ability to directly query the dependency data.

[4] I’m just kidding. It was a lot of time-consuming work.

[5] In this case, in case it’s not clear: NDepend for analysis and good ol’ ReSharper for refactoring. And ReSharper’s new(ish) architecture view is also quite good, though not even close to detailed enough to replace NDepend: it shows assembly-level dependencies only.

Splitting up assemblies in Quino using NDepend (Part I)

2015-10-04T07:44:39+02:00

Published by marco on 4. Oct 2015 07:44:39 (GMT-5)

A lot of work has been put into Quino 2.0 [1], with almost no stone left unturned. Almost every subsystem has been refactored and simplified, including but not limited to the data driver, the schema migration, generated code and metadata, model-building, security and authentication, service-application support and, of course, configuration and execution.

Two of the finishing touches before releasing 2.0 are to reorganize all of the code into a more coherent namespace structure and to reduce the size of the two monolithic assemblies: Encodo and Quino.

A Step Back

The first thing to establish is: why are we doing this? Why do we want to reduce dependencies and reduce the size of our assemblies? There are several reasons, but a major reason is to improve the discoverability of patterns and types in Quino. Two giant assemblies are not inviting—they are, in fact, daunting. Replace these assemblies with dozens of smaller ones and users of your framework will be more likely to (A) find what they’re looking for on their own and (B) build their own extensions with the correct dependencies and patterns. Neither of these is guaranteed, but smaller modules are a great start.

Another big reason is portability. The .NET Core was released as open-source software some time ago and more and more .NET source code is added to it each day. There are portable targets, non-Windows targets, Universal-build targets and much more. It makes sense to split code up into highly portable units with as few dependencies as possible. That is, the dependencies should be explicit and intended.

Not only that, but NuGet packaging has come to the fore more than ever. Quino was originally designed to keep third-party boundaries clear, but we wanted to make it as easy as possible to use Quino. Just include Encodo and Quino and off you went. However, with NuGet, you can now say you want to use Quino.Standard and you’ll get Quino.Core, Encodo.Core, Encodo.Services.SimpleInjector, Quino.Services.SimpleInjector and other packages.

With so much interesting code in the Quino framework, we want to make it available as much as possible not only for our internal projects but also for customer projects where appropriate and, also, possibly for open-source distribution.

NDepend

I’ve used NDepend before [2] to clean up dependencies. However, the last analysis I did about a year ago showed quite deep problems [3] that needed to be addressed before any further dependency analysis could bear fruit at all. With that work finally out of the way, I’m ready to re-engage with NDepend and see where we stand with Quino.

As luck would have it, NDepend is in version 6, released at the start of summer 2015. As was the case last year, NDepend has generously provided me with an upgrade license to allow me to test and evaluate the new version with a sizable and real-world project.

Here is some of the feedback I sent to NDepend (Twitter):

I really, really like the depth of insight NDepend gives me into my code. I find myself thinking “SOLID” much more often when I have NDepend shaking its head sadly at me, tsk-tsking at all of the dependency snarls I’ve managed to build.

It’s fast and super-reliable. I can work these checks into my workflow relatively easily.

I’m using the matrix view a lot more than the graphs because even NDepend recommends I don’t use a graph for the number of namespaces/classes I’m usually looking at

Where the graph view is super-useful is for examining *indirect* dependencies, which are harder to decipher with the graph

I’ve found so many silly mistakes/lazy decisions that would lead to confusion for developers new to my framework

I’m spending so much time with it and documenting my experiences because I want more people at my company to use it

I haven’t even scratched the surface of the warnings/errors but want to get to that, as well (the Dashboard tells me of 71 rules violated; 9 critical; I’m afraid to look :-)

Use Cases

Before I get more in-depth with NDepend, please note that there at least two main use cases for this tool [4]:

Clean up a project or solution that has never had a professional dependency checkup
Analyze and maintain separation and architectural layers in a project or solution

These two use cases are vastly different. The first is like cleaning a gas-station bathroom for the first time in years; the second is more like the weekly once-over you give your bathroom at home. The tools you’ll need for the two jobs are similar, but quite different in scope and power. The same goes for NDepend: how you’ll use it to claw your way back to architectural purity is different than how you’ll use it to occasionally clean up an already mostly-clean project.

Quino is much better than it was the last time we peeked under the covers with NDepend, but we’re still going to need a bucket of industrial cleaner before we’re done. [5]

The first step is to make sure that you’re analyzing the correct assemblies. Show the project properties to see which assemblies are included. You should remove all assemblies from consideration that don’t currently interest you (especially if your library is not quite up to snuff, dependency-wise; afterwards, you can leave as many clean assemblies in the list as you like). [6]

Industrial-strength cleaner for Quino

Running an analysis with NDepend 6 generates a nice report, which includes the following initial dependency graph for the assemblies.

Assembly dependencies for Encodo and Quino

As you can see, Encodo and Quino depend only on system assemblies, but there are components that pull in other references where they might not be needed. The initial dependency matrices for Encodo and Quino both look much better than they did when I last generated one. The images below show what we have to work with in the Encodo and Quino assemblies.

Initial Dependency Matrix for EncodoInitial Dependency Matrix for Quino

It’s not as terrible as I’ve made out, right? There is far less namespace-nesting, so it’s much easier to see where the bidirectional dependencies are. There are only a handful of cyclic dependencies in each library, with Encodo edging out Quino because of (A) the nature of the code and (B) I’d put more effort into Encodo so far.

I’m not particularly surprised to see that this is relatively clean because we’ve put effort into keeping the external dependencies low. It’s the internal dependencies in Encodo and Quino that we want to reduce.

Small and Focused Assemblies

(Partial) Architecture Graph for Quino 2.0-beta3Encodo assemblies for Quino 2.0-beta3Quino assemblies for Quino 2.0-beta3The goal, as stated in the title of this article, is to split Encodo and Quino into separate assemblies. While removing cyclic dependencies is required for such an operation, it’s not sufficient. Even without cycles, it’s still possible that a given assembly is still too dependent on other assemblies.

Before going any farther, I’m going to list the assemblies we’d like to have. By “like to have”, I mean the list that we’d originally planned plus a few more that we added while doing the actual splitting. [7] The images on the right show the assemblies in Encodo, Quino and a partial overview of the dependency graph (calculated with the ReSharper Architecture overview rather than with NDepend, just for variety).

Of these, the following assemblies and their dependencies are of particular interest [8]:

Encodo.Core: System dependencies only
Encodo.Application: basic application support [9]
Encodo.Application.Standard: configuration methods for non-metadata applications that don’t want to pick and choose packages/assemblies
Encodo.Expressions: depends only on Encodo.Core
Quino.Meta: depends only on Encodo.Core and Encodo.Expressions
Quino.Meta.Standard: Optional, but useful metadata extensions
Quino.Application: depends only on Encodo.Application and Quino.Meta
Quino.Application.Standard: configuration methods for metadata applications that don’t want to pick and choose packages/assemblies
Quino.Data: depends on Quino.Application and some Encodo.* assemblies
Quino.Schema: depends on Quino.Data

This seems like a good spot to stop, before getting into the nitty-gritty detail of how we used NDepend in practice. In the next article, I’ll discuss both the high-level and low-level workflows I used with NDepend to efficiently clear up these cycles. Stay tuned!

[1]

Release notes for 2.0 betas:

Articles about design:

[2]

I published a two-parter in August and November of 2014.

[3] You can see a lot of the issues associated with these changes in the release notes for Quino 2.0-beta1 (mostly the first point in the “Highlights” section) and Quino 2.0-beta2 (pretty much all of the points in the “Highlights” section).

[4] I’m sure there are more, but those are the ones I can think of that would apply to my project (for now).

[5] …to stretch the gas-station metaphor even further.

[6]

Here I’m going to give you a tip that confused me for a while, but that I think was due to particularly bad luck and is actually quite a rare occurrence.

If you already see the correct assemblies in the list, you should still check that NDepend picked up the right paths. That is, if you haven’t followed the advice in NDepend’s white paper and still have a different bin folder for each assembly, you may see something like the following in the tooltip when you hover over the assembly name:

“Several valid .NET assemblies with the name {Encodo} have been found. They all have the same version. the one with the biggest file has been chosen.”

If NDepend has accidentally found an older copy of your assembly, you must delete that assembly. Even if you add an assembly directly, NDepend will not honor the path from which you added it. This isn’t as bad as it sounds, since it’s a very strange constellation of circumstances that led to this assembly hanging around anyway:

The project is no longer included in the latest Quino but lingers in my workspace
The version number is unfortunately the same, even though the assembly is wildly out of date

I only noticed because I knew I didn’t have that many dependency cycles left in the Encodo assembly.

[7] Especially for larger libraries like Quino, you’ll find that your expectations about dependencies between modules will be largely correct, but will still have gossamer filaments connecting them that prevent a clean split. In those cases, we just created new assemblies to hold these common dependencies. Once an initial split is complete, we’ll iterate and refactor to reduce some of these ad-hoc assemblies.

[8] Screenshots, names and dependencies are based on a pre-release version of Quino, so while the likelihood is small, everything is subject to change.

[9] Stay tuned for an upcoming post on the details of starting up an application, which is the support provided in Encodo.Application.

API Design: To Generic or not Generic? (Part II)

2015-09-26T11:27:08+02:00

Published by marco on 26. Sep 2015 11:27:08 (GMT-5)

Updated by marco on 15. Jan 2017 23:18:41 (GMT-5)

In this article, I’m going to continue the discussion started in Part I, where we laid some groundwork about the state machine that is the startup/execution/shutdown feature of Quino. As we discussed, this part of the API still suffers from “several places where generic TApplication parameters [are] cluttering the API”. In this article, we’ll take a closer look at different design approaches to this concrete example—and see how we decided whether to use generic type parameters.

Consistency through Patterns and API

Any decision you take with a non-trivial API is going to involve several stakeholders and aspects. It’s often not easy to decide which path is best for your stakeholders and your product.

For any API you design, consider how others are likely to extend it—and whether your pattern is likely to deteriorate from neglect.

For any API you design, consider how others are likely to extend it—and whether your pattern is likely to deteriorate from neglect. Even a very clever solution has to be balanced with simplicity and elegance if it is to have a hope in hell of being used and standing the test of time.

In Quino 2.0, the focus has been on ruthlessly eradicating properties on the IApplication interface as well as getting rid of the descendant interfaces, ICoreApplication and IMetaApplication. Because Quino now uses a pattern of placing sub-objects in the IOC associated with an IApplication, there is far less need for a generic TApplication parameter in the rest of the framework. See Encodo’s configuration library for Quino: part I for more information and examples.

This focus raised an API-design question: if we no longer want descendant interfaces, should we eliminate parameters generic in that interface? Or should we continue to support generic parameters for applications so that the caller will always get back the type of application that was passed in?

Before getting too far into the weeds [1], let’s look at a few concrete examples to illustrate the issue.

Do Fluent APIs require generic return-parameters?

As discussed in Encodo’s configuration library for Quino: part III in detail, Quino applications are configured with the “Use*” pattern, where the caller includes functionality in an application by calling methods like UseRemoteServer() or UseCommandLine(). The latest version of this API pattern in Quino recommends returning the application that was passed in to allow chaining and fluent configuration.

For example, the following code chains the aforementioned methods together without creating a local variable or other clutter.

return new CodeGeneratorApplication().UseRemoteServer().UseCommandLine();

What should the return type of such standard configuration operations be? Taking a method above as an example, it could be defined as follows:

public static IApplication UseCommandLine(this IApplication application, string[] args) { … }

This seems like it would work fine, but the original type of the application that was passed in is lost, which is not exactly in keeping with the fluent style. In order to maintain the type, we could define the method as follows:

public static TApplication UseCommandLine(this TApplication application, string[] args)
  where TApplication : IApplication
{ … }

This style is not as succinct but has the advantage that the caller loses no type information. On the other hand, it’s more work to define methods in this way and there is a strong likelihood that many such methods will simply be written in the style in the first example.

Generics definitely offer advantages, but it remains to be seen how much those advantages are worth.

Why would other coders do that? Because it’s easier to write code without generics, and because the stronger result type is not needed in 99% of the cases. If every configuration method expects and returns an IApplication, then the stronger type will never come into play. If the compiler isn’t going to complain, you can expect a higher rate of entropy in your API right out of the gate.

One way the more-derived type would come in handy is if the caller wanted to define the application-creation method with their own type as a result, as shown below:

private static CodeGeneratorApplication CreateApplication()
{
  return new CodeGeneratorApplication().UseRemoteServer().UseCommandLine();
}

If the library methods expect and return IApplication values, the result of UseCommandLine() will be IApplication and requires a cast to be used as defined above. If the library methods are defined generic in TApplication, then everything works as written above.

This is definitely an advantage, in that the user gets the exact type back that they created. Generics definitely offer advantages, but it remains to be seen how much those advantages are worth. [2]

Another example: The `IApplicationManager`

Before we examine the pros and cons further, let’s look at another example.

In Quino 1.x, applications were created directly by the client program and passed into the framework. In Quino 2.x, the IApplicationManager is responsible for creating and executing applications. A caller passes in two functions: one to create an application and another to execute an application.

A standard application startup looks like this:

new ApplicationManager().Run(CreateApplication, RunApplication); [3]

Generic types can trigger an avalanche of generic parameters™ throughout your code.

The question is: what should the types of the two function parameters be? Does CreateApplication return an IApplication or a caller-specific derived type? What is the type of the application parameter passed to RunApplication? Also IApplication? Or the more derived type returned by CreateApplication?

As with the previous example, if the IApplicationManager is to return a derived type, then it must be generic in TApplication and both function parameters will be generically typed as well. These generic types will trigger an avalanche of generic parameters™ throughout the other extension methods, interfaces and classes involved in initializing and executing applications.

That sounds horrible. This sounds like a pretty easy decision. Why are we even considering the alternative? Well, because it can be very advantageous if the application can declare RunApplication with a strictly typed signature, as shown below.

private static void RunApplication(CodeGeneratorApplication application) { … }

Neat, right? I’ve got my very own type back.

Where Generics Goes off the Rails

However, if the IApplicationManager is to call this function, then the signature of CreateAndStartUp() and Run() have to be generic, as shown below.

TApplication CreateAndStartUp(
  Func createApplication
)
 where TApplication : IApplication;

IApplicationExecutionTranscript Run(
  Func createApplication,
  Action run
)
  where TApplication : IApplication;

These are quite messy—and kinda scary—signatures. [4] if these core methods are already so complex, any other methods involved in startup and execution would have to be equally complex—including helper methods created by calling applications. [5]

The advantage here is that the caller will always get back the type of application that was created. The compiler guarantees it. The caller is not obliged to cast an IApplication back up to the original type. The disadvantage is that all of the library code is infected by a generic parameter with its attendant IApplication generic constraint. [6]

Don’t add Support for Conflicting Patterns

The title of this section seems pretty self-explanatory, but we as designers must remain vigilant against the siren call of what seems like a really elegant and strictly typed solution.

But aren’t properties on an application exactly what we just worked so hard to eliminate?

The generics above establish a pattern that must be adhered to by subsequent extenders and implementors. And to what end? So that a caller can attach properties to an application and access those in a statically typed manner, i.e. without casting?

But aren’t properties on an application exactly what we just worked so hard to eliminate? Isn’t the recommended pattern to create a “settings” object and add it to the IOC instead? That is, as of Quino 2.0, you get an IApplication and obtain the desired settings from its IOC. Technically, the cast is still taking place in the IOC somewhere, but that seems somehow less bad than a direct cast.

If the framework recommends that users don’t add properties to an application—and ruthlessly eliminated all standard properties and descendants—then why would the framework turn around and add support—at considerable cost in maintenance and readability and extendibility—for callers that expect a certain type of application?

Wrapping up

Let’s take a look at the non-generic implementation and see what we lose or gain. The final version of the IApplicationManager API is shown below, which properly balances the concerns of all stakeholders and hopefully will stand the test of time (or at least last until the next major revision).

IApplication CreateAndStartUp(
  Func createApplication
);

IApplicationExecutionTranscript Run(
  Func createApplication,
  Action run
);

These are the hard questions of API design: ensuring consistency, enforcing intent and balancing simplicity and cleanliness of code with expressiveness.

[1] A predilection of mine, I’ll admit, especially when writing about a topic about which I’ve thought quite a lot. In those cases, the instinct to just skip “the object” and move on to the esoteric details that stand in the way of an elegant, perfect solution, is very, very strong.

[2] This more-realized typing was so attractive that we used it in many places in Quino without properly weighing the consequences. This article is the result of reconsidering that decision.

[3] This call looks the same for all UI (console, Winform, WPF, etc.), all services (e.g. ASP.NET, Windows-services, etc.) as well as for automated tests. This fact isn’t germane to the discussion above, but it’s pretty neat in its own right. All an application has to do is define two methods with the right signatures and call the appropriate Run() method for the desired type of application. Almost all of the startup code is shared and the pattern is the same everywhere.

[4] Yes, the C# compiler will allow you to elide generics for most method calls (so long as the compiler can determine the types of the parameters without it). However, generics cannot be removed from constructor calls. These must always specify all generic parameters, which makes for messier-looking, lengthy code in the caller e.g. when creating the ApplicationManager were it to have been defined with generic parameters. Yet another thing to consider when choosing how to define you API.

[5] As already mentioned elsewhere (but it bears repeating): callers can, of course, eschew the generic types and use IApplication everywhere—and most probably will, because the advantage offered by making everything generic is vanishingly small.. If your API looks this scary, entropy will eat it alive before the end of the week, to say nothing of its surviving to the next major version.

[6] A more subtle issue that arises is if you do end up—even accidentally—mixing generic and non-generic calls (i.e. using IApplication as the extended parameter in some cases and TApplication in others). This issue is in how the application object is registered in the IOC. During development, when the framework was still using generics everywhere (or almost everywhere), some parts of the code were retrieving a reference to the application using the most-derived type whereas the application had been registered in the container as a singleton using IApplication. The call to retrieve the most derived type returned a new instance of the application rather than the pre-registered singleton, which was a subtle and difficult bug to track down.

API Design: Running an Application (Part I)

2015-09-19T07:29:59+02:00

Published by marco on 19. Sep 2015 07:29:59 (GMT-5)

Updated by marco on 26. Sep 2015 11:24:56 (GMT-5)

In this article, we’re going to discuss a bit more about the configuration library in Quino 2.0.

Other entries on this topic have been the articles about Encodo’s configuration library for Quino: part I, part II and part III.

The goal of this article is to discuss a concrete example of how we decided whether to use generic type parameters throughout the configuration part of Quino. The meat of that discussion will be in a part 2 because we’re going to have to lay some groundwork about the features we want first. (Requirements!)

A Surfeit of Generics

As of Quino 2.0-beta2, the configuration library consisted of a central IApplication interface which has a reference to an IOC container and a list of startup and shutdown actions.

As shown in part III, these actions no longer have a generic TApplication parameter. This makes it not only much easier to use the framework, but also easier to extend it. In this case, we were able to remove the generic parameter without sacrificing any expressiveness or type-safety.

As of beta2, there were still several places where generic TApplication parameters were cluttering the API. Could we perhaps optimize further? Throw out even more complexity without losing anything?

Starting up an application

One of these places is the actual engine that executes the startup and shutdown actions. This code is a bit trickier than just a simple loop because Quino supports execution in debug mode—without exception-handling—and release mode—with global exception-handling and logging.

As with any application that uses an IOC container, there is a configuration phase, during which the container can be changed and an execution phase, during which the container produces objects but can no longer be re-configured.

Until 2.0-beta2, the execution engine was encapsulated in several extension methods called Run(), StartUp() and so on. These methods were generally generic in TApplication. I write “generally” because there were some inconsistencies with extension methods for custom application types like Winform or Console applications.

While extension methods can be really useful, this usage was not really appropriate as it violated the open/closed principle. For the final release of Quino, we wanted to move this logic into an IApplicationManager so that applications using Quino could (A) choose their own logic for starting an application and (B) add this startup class to a non-Quino IOC container if they wanted to.

Application Execution Modes

So far, so good. Before we discuss how to rewrite the application manager/execution engine, we should quickly revisit what exactly this engine is supposed to do. As it turns out, not only do we wnat to make an architectural change to make the design more open for extension, but the basic algorithm for starting an application changed, as well.

What does it mean to run an application?

Quino has always acknowledged and kinda/sorta supported the idea that a single application can be run in different ways. Even an execution that results in immediate failure technically counts as an execution, as a traversal of the state machine defined by the application.

If we view an application for the state machine that it is, then every application has at least two terminal nodes: OK and Error.

But what does OK mean for an application? In Quino, it means that all startup actions were executed without error and the run() action passed in by the caller was also executed without error. Anything else results in an exception and is shunted to Error.

But is that true, really? Can you think of other ways in which an application could successfully execute without really having failed? For most applications, the answer is yes. Almost every application—and certainly every Quino application—supports a command line. One of the default options for the command line of a Quino application is -h, which shows a manual for the other command-line options.

If the application is running in a console, this manual is printed to the console; for a Winform application, a dialog box is shown; and so on.

This “help” mode is actually a successful execution of the application that did not result in the main event loop of the application being executed.

Thought of in this way, any command-line option that controls application execution could divert the application to another type of terminal node in the state machine. A good example is when an application provides support for importing or exporting data via the command line.

“Canceled” Terminal Nodes

A terminal node is also not necessarily only Crashed or Ok. Almost any application will also need to have a Canceled mode that is a perfectly valid exit state. For example,

If the application requires a login during execution (startup), but the user aborts authentication
If the application supports schema migration, but the user aborts without migrating the schema

These are two ways in which a standard Quino application could run to completion without crashing but without having accomplished any of its main tasks. It ran and it didn’t crash, but it also didn’t do anything useful.

Intermediate Nodes in the Application State Machine

This section title sounds a bit pretentious, but that’s exactly what we want to discuss here. Instead of having just start and terminal nodes, the Quino startup supports cycles through intermediate nodes as well. What the hell does that mean? It means that some nodes may trigger Quino to restart in a different mode in order to handle a particular kind of error condition that could be repaired. [1]

A concrete example is desperately needed here, I think. The main use of this feature in Quino right now is to support on-the-fly schema-migration without forcing the user to restart the application. This feature has been in Quino from the very beginning and is used almost exclusively by developers during development. The use case to support is as follows:

Developer is running an application
Developer make change to the model (or pulls changes from the server)
Developer runs the application with a schema-change
Application displays migration tool; developer can easily migrate the schema and continue working

This workflow minimizes the amount of trouble that a developer has when either making changes or when integrating changes from other developers. In all cases in which the application model is different from the developer’s database schema, it’s very quick and easy to upgrade and continue working.

“Rescuing” an application in Quino 2.0

How does this work internally in Quino 2.0? The application starts up but somehow encounters an error that indicates that a schema migration might be required. This can happen in one of two ways:

The schema-verification step in the standard Quino startup detects a change in the application model vis à vis the data schema
Some other part of the startup accesses the database and runs into a DatabaseException that is indicative of a schema-mismatch

In all of these cases, the application that was running throws an ApplicationRestartException, that the standard IApplicationManager implementation knows how to handle. It handles it by shutting down the running application instance and asking the caller to create a new application, but this time one that knows how to handle the situation that caused the exception. Concretely, the exception includes an IApplicationCreationSettings descendant that the caller can use to decide how to customize the application to handle that situation.

The manager then runs this new application to completion (or until a new RestartApplicationException is thrown), shuts it down, and asks the caller to create the original application again, to give it another go.

In the example above, if the user has successfully migrated the schema, then the application will start on this second attempt. If not, then the manager enters the cycle again, attempting to repair the situation so that it can get to a terminal node. Naturally, the user can cancel the migration and the application also exits gracefully, with a Canceled state.

A few examples of possible application execution paths:

Standard => OK
Standard => Error
Standard => Canceled
Standard => Restart => Migrator => Standard => OK
Standard => Restart => Migrator => Canceled

The pattern is the same for interactive, client applications as for headless applications like test suites, which attempt migration once and abort if not successful. Applications like web servers or other services will generally only support the OK and Error states and fail when they encounter a RestartApplicationException.

Still, it’s nice to know that the pattern is there, should you need it. It fits relatively cleanly into the rest of the API without making it more complicated. The caller passes two functions to the IApplicationManager: one to create an application and one to run it.

An example from the Quino CodeGeneratorApplication is shown below:

internal static void Main()
{
  new ApplicationManager().Run(CreateApplication, GenerateCode);
}

private static IApplication CreateApplication(
  IApplicationCreationSettings applicationCreationSettings
) { … }

private static void GenerateCode(IApplication application) { … }

We’ll see in the next post what the final API looks like and how we arrived at the final version of that API in Quino 2.0.

[1] Or rescued, using the nomenclature from Eiffel exception-handling, which actually does something very similar. The exception handling in most languages lets you clean up and move on, but the intent isn’t necessarily to re-run the code that failed. In Eiffel, this is exactly how exception-handling works: fix whatever was broken and re-run the original code. Quino now works very much like this as well.

Encodo Git Handbook 3.0

2015-09-19T07:18:15+02:00

Published by marco on 19. Sep 2015 07:18:15 (GMT-5)

Encodo first published a Git Handbook for employees in September 2011 and last updated it in July of 2012. Since then, we’ve continued to use Git, refining our practices and tools. Although a lot of the content is still relevant, some parts are quite outdated and the overall organization suffered through several subsequent, unpublished updates.

What did we change from the version 2.0?

We removed all references to the Encodo Git Shell. This shell was a custom environment based on Cygwin. It configured the SSH agent, set up environment variables and so on. Since tools for Windows have improved considerably, we no longer need this custom tool. Instead, we’ve moved to PowerShell and PoshGit to handle all of our Git command-line needs.
We removed all references to Enigma. This was a Windows desktop application developed by Encodo to provide an overview, eager-fetching and batch tasks for multiple Git repositories. We stopped development on this when SmartGit included all of the same functionality in versions 5 and 6.
We removed all detailed documentation for Git submodules. Encodo stopped using submodules (except for one legacy project) several years ago. We used to use submodules to manage external binary dependencies but have long since moved to NuGet instead.
We reorganized the chapters to lead off with a quick overview of Basic Concepts followed by a focus on Best Practices and our recommended Development Process. We also reorganized the Git-command documentation to use a more logical order.

You can download version 3 of the Git Handbook or get the latest copy from here.

Chapter 3, Basic Concepts and chapter 4, Best Practices have been included in their entirety below.

3 Best Practices

3.1 Focused Commits

Focused commits are required; small commits are highly recommended. Keeping the number of changes per commit tightly focused on a single task helps in many cases.

They are easier to resolve when merge conflicts occur
They can be more easily merged/rebased by Git
If a commit addresses only one issue, it is easier for a reviewer or reader to decide whether it should be examined.

For example, if you are working on a bug fix and discover that you need to refactor a file as well, or clean up the documentation or formatting, you should finish the bug fix first, commit it and then reformat, document or refactor in a separate commit.

Even if you have made a lot of changes all at once, you can still separate changes into multiple commits to keep those commits focused. Git even allows you to split changes from a single file over multiple commits (the Git Gui provides this functionality as does the index editor in SmartGit).

3.2 Snapshots

Use the staging area to make quick snapshots without committing changes but still being able to compare them against more recent changes.

For example, suppose you want to refactor the implementation of a class.

Make some changes and run the tests; if everything’s ok, stage those changes
Make more changes; now you can diff these new changes not only against the version in the repository but also against the version in the index (that you staged).
If the new version is broken, you can revert to the staged version or at least more easily figure out where you went wrong (because there are fewer changes to examine than if you had to diff against the original)
If the new version is ok, you can stage it and continue working

3.3 Developing New Code

Where you develop new code depends entirely on the project release plan.

Code for releases should be committed to the release branch (if there is one) or to the develop branch if there is no release branch for that release
If the new code is a larger feature, then use a feature branch. If you are developing a feature in a hotfix or release branch, you can use the optional base parameter to base the feature on that branch instead of the develop branch, which is the default.

3.4 Merging vs. Rebasing

Follow these rules for which command to use to combine two branches:

If both branches have already been pushed, then merge. There is no way around this, as you won’t be able to push a non-merged result back to the origin.
If you work with branches that are part of the standard branching model (e.g. release, feature, etc.), then merge.
If both you and someone else made changes to the same branch (e.g. develop), then rebase. This will be the default behavior during development

4 Development Process

A branching model is required in order to successfully manage a non-trivial project.

Whereas a trivial project generally has a single branch and few or no tags, a non-trivial project has a stable release—with tags and possible hotfix branches—as well as a development branch—with possible feature branches.

A common branching model in the Git world is called Git Flow. Previous versions of this manual included more specific instructions for using the Git Flow-plugin for Git but experience has shown that a less complex branching model is sufficient and that using standard Git commands is more transparent.

However, since Git Flow is a very widely used branching model, retaining the naming conventions helps new developers more easily understand how a repository is organized.

4.1 Branch Types

The following list shows the branch types as well as the naming convention for each type:

master is the main development branch. All other branches should be merged back to this branch (unless the work is to be discarded). Developers may apply commits and create tags directly on this branch.
feature/name is a feature branch. Feature branches are for changes that require multiple commits or coordination between multiple developers. When the feature is completed and stable, it is merged to the master branch after which it should be removed. Multiple simultaneous feature branches are allowed.
release/vX.X.X is a release branch. Although a project can be released (and tagged) directly from the master branch, some projects require a longer stabilization and testing phase before a release is ready. Using a release branch allows development on the develop branch to continue normally without affecting the release candidate. Multiple simultaneous release branches are strongly discouraged.
hotfix/vX.X.X is a hotfix branch. Hotfix branches are always created from the release tag for the version in which the hotfix is required. These branches are generally very short-lived. If a hotfix is needed in a feature or release branch, it can be merged there as well (see the optional arrow in the following diagram).

The main difference from the Git Flow branching model is that there is no explicit stable branch. Instead, the last version tag serves the purpose just as well and is less work to maintain. For more information on where to develop code, see “3.3 – Developing New Code”.

4.2 Example

To get a better picture of how these branches are created and merged, the following diagram depicts many of the situations outlined above.

The diagram tells the following story:

Development began on the master branch
v1.0 was released directly from the master branch
Development on feature “B” began
A bug was discovered in v1.0 and the v1.0.1 hotfix branch was created to address it
Development on feature “A” began
The bug was fixed, v1.0.1 was released and the fix was merged back to the master branch
Development continued on master as well as features “A” and “B”
Changes from master were merged to feature “A” (optional merge)
Release branch v1.1 was created
Development on feature “A” completed and was merged to the master branch
v1.1 was released (without feature “A”), tagged and merged back to the master branch
Changes from master were merged to feature “B” (optional merge)
Development continued on both the master branch and feature “B”
v1.2 was released (with feature “A”) directly from the master branch

Branching Model Example
Legend:

Circles depict commits
Blue balloons are the first commit in a branch
Grey balloons are a tag
Solid arrows are a required merge
Dashed arrows are an optional merge

ReSharper Unit Test Runner 9.x update

2015-09-03T12:30:57+02:00

Published by marco on 3. Sep 2015 12:30:57 (GMT-5)

Way back in February, I wrote about my experiences with ReSharper 9 when it first came out. The following article provides an update, this time with version 9.2, released just last week.

tl;dr: I’m back to ReSharper 8.2.3 and am a bit worried about the state of the 9.x series of ReSharper. Ordinarily, JetBrains has eliminated performance, stability and functional issues by the first minor version-update (9.1), to say nothing of the second (9.2).

Test Runner

In the previous article, my main gripe was with the unit-test runner, which was unusable due to flakiness in the UI, execution and change-detection. With the release of 9.2, the UI and change-detection problems have been fixed, but the runner is still quite flaky at executing tests.

What follows is the text of the report that I sent to JetBrains when they asked me why I uninstalled R# 9.2.

As with 9.0 and 9.1, I am unable to productively use the 9.2 Test Runner with many of my NUnit tests. These tests are not straight-up, standard tests, but R# 8.2.3 handled them without any issues whatsoever.

What’s special about my tests?

There are quite a few base classes providing base functionality. The top layers provide scenario-specific input via a generic type parameter.
- TestsBase
  − OtherBase
     (7 of these, one with an NUnit CategoryAttribute)
    − ConcreteTests
       (defines tests with NUnit TestAttributes)
      − ProviderAConcreteTests
         (CategoryAttribute)
        − ProtocolAProviderAConcreteTests
          (TMixin = ProtocolAProviderA; TestFixtureAttribute, CategoryAttributes)
        − ProtocolBProviderAConcreteTests
          (TMixin = ProtocolBProviderA; TestFixtureAttribute, CategoryAttributes)
      − ProviderBConcreteTests
         (CategoryAttribute)
        − ProtocolAProviderBConcreteTests
          (TMixin = ProtocolAProviderB; TestFixtureAttribute, CategoryAttributes)
        − ProtocolBProviderBConcreteTests
          (TMixin = ProtocolBProviderB; TestFixtureAttribute, CategoryAttributes)
The test runner in 9.2 is not happy with this at all. The test explorer shows all of the tests correctly, with the test counts correct. If I select a node for all tests for ProviderB and ProtocolA (696 tests in 36 fixtures), R# loads 36 non-expandable nodes into the runner and, after a bit of a wait, marks them all as inconclusive. Running an individual test-fixture node does not magically cause the tests to load or appear and also shows inconclusive (after a while; it seems the fixture setup executes as expected but the results are not displayed).

If I select a specific, concrete fixture and add or run those tests, R# loads and executes the runner correctly. If I select multiple test fixtures in the explorer and add them, they also show up as expandable nodes, with the correct test counts, and can be executed individually (per fixture). However, if I elect to run them all by running the parent node, R# once again marks everything as inconclusive.

As I mentioned, 8.2.3 handles this correctly and I feel R# 9.2 isn’t far off—the unit-test explorer does, after all, show the correct tests and counts. In 9.2, it’s not only inconvenient, but I’m worried that my tests are not being executed with the expected configuration.

Also, I really missed the StyleCop plugin for 9.2. There’s a beta version for 9.1 that caused noticeable lag, so I’m still waiting for a more unobtrusive version for 9.2 (or any version at all).

While it’s possible that there’s something I’m doing wrong, or there’s something in my installation that’s strange, I don’t think that’s the problem. As I mentioned, test-running for the exact same solution with 8.2.3 is error-free and a pleasure to use. In 9.2, the test explorer shows all of the tests correctly, so R# is clearly able to interpret the hierarchy and attributes (noted above) as I’ve intended them to be interpreted. This feels very much like a bug or a regression for which JetBrains doesn’t have test coverage. I will try to work with them to help them get coverage for this case.

Real-Time StyleCop rules

Additionally, the StyleCop plugin is absolutely essential for my workflow and there still isn’t an official release for any of the 9.x versions. ReSharper 9.2 isn’t supported at all yet, even in prerelease form. The official Codeplex page shows the latest official version as 4.7, released in January of 2012 for ReSharper 8.2 and Visual Studio 2013. One would imagine that VS2015 support is in the works, but it’s hard to say. There is a page for StyleCop in the ReSharper extensions gallery but that shows a beta4, released in April of 2015, that only works with ReSharper 9.1.x, not 9.2. I tested it with 9.1.x, but it noticeably slowed down the UI. While typing was mostly unaffected, scrolling and switching file-tabs was very laggy. Since StyleCop is essential for so many developers, it’s hard to see why the plugin gets so little love from either JetBrains or Microsoft.

GoTo Word

The “Go To Word” plugin is not essential but it is an extremely welcome addition, especially with so much more client-side work depending on text-based bindings that aren’t always detected by ReSharper. In those cases, you can find—for example—all the references of a Knockout template by searching just as you would for a type or member. Additionally, you benefit from the speed of the ReSharper indexing engine and search UI instead of using the comparatively slow and ugly “Find in Files” support in Visual Studio. Alternatives suggested in the comments to the linked issue above all depend on building yet another index of data (e.g. Sando Code Search Tool). JetBrains has pushed off integrating go-to-word until version 10. Again, not a deal-breaker, but a shame nonetheless, as I’ll have to do without it in 9.x until version 10 is released.

With so much more client-side development going on in Visual Studio and with dynamic languages and data-binding languages that use name-matching for data-binding, GoToWord is more and more essential. Sure, ReSharper can continue to integrate native support for finding such references, but until that happens, we’re stuck with the inferior Find-in-Files dialog or other extensions that increase the memory pressure for larger solutions.

v2.0-beta2: Code generation, IOC and configuration

2015-05-30T23:51:19+02:00

Published by marco on 30. May 2015 23:51:19 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

In beta1, we read about changes to configuration, the data driver architecture, DDL commands, and security and access control in web applications.

In beta-2, we made the following additional improvements:

Introduced a new generated-code version that avoids all global references and provides a much-improved API for working with metadata. See below for more details. (QNO-4821, QNO-4820, QNO-4819, QNO-4818, QNO-4817, QNO-4816, QNO-4815, QNO-4814, QNO-4813, QNO-4812, QNO-4507, QNO-4506, QNO-4117)
Continued to update and standardize configuration and execution for applications, as documented in Encodo’s configuration library for Quino: part I, part II and part III. (QNO-4834, QNO-4831, QNO-4809, QNO-4772, QNO-4663, QNO-4664, QNO-4360, QNO-4771, QNO-4676, QNO-4675, QNO-4673, QNO-3765)
Moved all globals and static classes with global dependencies to the application’s IOC container. (QNO-4791, QNO-4792, QNO-4790, QNO-4784, QNO-4782, QNO-4778, QNO-4787, QNO-4774)
Reduce direct dependencies and properties in IApplication, ICoreApplication and IMetaApplication. (QNO-4789, QNO-4788, QNO-4786, QNO-4785, QNO-4671, QNO-4669, , QNO-4668, QNO-4667, QNO-4660)
Removed references to GlobalContext and ServiceLocator from the Winforms components (QNO-4832)
Standardized naming and namespaces to conform to industry, StyleCop and .NET conventions. (QNO-4807, QNO-4806)
Added support for parameterized custom SQL queries with the ICustomCommandBuilder. This was added by customer request, for applications that formulate queries that are beyond what the Quino ORM is currently capable of mapping. A blog post with more detail on how this works is forthcoming. (QNO-4802)
Further cleanup and consolidation in the data-driver hierarchy. This work was a direct result of Daniel Roth’s Bachelor’s thesis work in which he integrated NHibernate as an alternative ORM for Quino. (QNO-4808; still to do by RTM: QNO-4749)
Discontinued support for DataContract and DataMember attributes in metadata and generated code. (QNO-4823, QNO-4826)

Goodbye, old friends

This release addressed some issues that have been bugging us for a while (almost 3 years in one case).

QNO-3765 (32 months): After a schema migration caused by a DatabaseException on login, restart the application
QNO-4117 (27 months): PreferredType registration for models is not always executed
QNO-4408 (18 months): When access to the remoting server is unauthorized, the web site should respond with an error
QNO-4506 (14 months): The code generator should generate the persistent object and metadata references in separate classes
QNO-4507 (14 months): Business objects for modules should not rely on GlobalContext in generated code

You will not be missed.

Breaking changes

As we’ve mentioned before, this release is absolutely merciless in regard to backwards compatibility. Old code is not retained as obsolete Obsolete. Instead, a project upgrading to 2.0 will encounter compile errors.

That said, if you arm yourself with a bit of time, ReSharper and the release notes (and possibly keep an Encodo employee on speed-dial), the upgrade is not difficult. It consists mainly of letting ReSharper update namespace references for you. In cases where the update is not so straightforward, we’ve provided release notes.

V1 generated code support

One of the few things you’ll be able to keep (at least for a minor version or two) is the old-style generated code. We made this concession because, while even a large solution can be upgraded from 1.13.0 to 2.0 relatively painlessly in about an hour (we’ve converted our own internal projects to test), changing the generated-code format is potentially a much larger change. Again, an upgrade to the generated-code format isn’t complicated but it might require more than an hour or two’s worth of elbow grease to complete.

Therefore, you’ll be able to not only retain your old generated code, but the code generator will continue support the old-style code-generation format for further development. Expect the grace period to be relatively short, though.

Regardless of whether you elect to keep the old-style generated code, you’ll have to do a little bit of extra work just to be able to generate code again.

Manually update a couple of generated files, as shown below.
Compile the solution
Generate code with the Quino tools

Before you can regenerate, you’ll have to manually update your previously generated code in the main model file, as shown below.

Previous version

static MyModel()
{
  Messages = new InMemoryRecorder();
  Loader = new ModelLoader(() => Instance, () => Messages, new MyModelGenerator());
}

public static IMetaModel CreateModel(IExtendedRecorder recorder)
{
  if (recorder == null) { throw new ArgumentNullException("recorder"); }

  var result = Loader.Generator.CreateModel(recorder);

  result.Configure();

  return result;
}

// More code …

/// 
protected override void DoConfigure()
{
  base.DoConfigure();

  ConfigurePreferredTypes();
  ApplyCustomConfiguration();
}

Manually updated version

static MyModel()
{
  Messages = new InMemoryRecorder();
  Loader = new ModelLoader(() => Instance, () => Messages, new MyModelGenerator());
}

public static IMetaModel CreateModel(IExtendedRecorder recorder)
{
  if (recorder == null) { throw new ArgumentNullException("recorder"); }

  var result = Loader.Generator(MyModel)new MyModelGenerator().CreateModel(
    ServiceLocator.Current.GetInstance(),
    ServiceLocator.Current.GetInstance(),
    recorder
  );

  result.ConfigurePreferredTypes();
  result.ApplyCustomConfiguration();

  return result;
}

/// 
protected override void DoConfigure()
{
  base.DoConfigure();

  ConfigurePreferredTypes();
  ApplyCustomConfiguration();
}

Integrate into the model builder

In the application configuration for the first time you generate code with Quino 2.0, you should use:

ModelLoader = MyModel.Loader;
this.UseMetaSimpleInjector();
this.UseModelLoader(MyModel.CreateModel);

After regenerating code, you should use the following for version-2 generated code:

ModelLoader = MyModel.Loader;
this.UseMetaSimpleInjector();
this.UseModelLoader(MyModelExtensions.CreateModelAndMetadata);

…and the following for version-1 generated code:

ModelLoader = MyModel.Loader;
this.UseMetaSimpleInjector();
this.UseModelLoader(MyModel.CreateModel);

Still to do by RTM

As you can see, we’ve already done quite a bit of work in beta1 and beta2. We have a few more tasks planned for the feature-complete release candidate for 2.0

Move the schema-migration metadata table to a module.

The Quino schema-migration extracts most of the information it needs from database schema itself. It also stores extra metadata in a special table. This table has been with Quino since before modules were supported (over seven years) and hence was built in a completely custom manner. Moving this support to a Quino metadata module will remove unnecessary implementation and make the migration process more straightforward. (QNO-4888)
Separate collection algorithm from storage/display method in IRecorder and descendants.

The recording/logging library has a very good interface but the implementation for the standard recorders has become too complex as we added support for multi-threading, custom disposal and so on. We want to clean this up to make it easier to extend the library with custom loggers. (QNO-4888)
Split up Encodo and Quino assemblies based on functionality.

There are only a very dependencies left to untangle (QNO-4678, QNO-4672, QNO-4670); after that, we’ll split up the two main Encodo and Quino assemblies along functional lines. (QNO-4376)
Finish integrating building and publishing NuGet and symbol packages into Quino’s release process.

And, finally, once we have the assemblies split up to our liking, we’ll finalize the Nuget packages for the Quino library and leave the direct-assembly-reference days behind us, ready for Visual Studio 2015.
(QNO-4376)

That’s all we’ve got for now. See you next month for the next (and, hopefully, final update)!

Encodo's configuration library for Quino: part III

2015-05-17T17:45:56+02:00

Published by marco on 17. May 2015 17:45:56 (GMT-5)

This discussion about configuration spans three articles:

part I discusses the history of the configuration system in Quino as well as a handful of principles we kept in mind while designing the new system
part II discusses the basic architectural changes and compares an example from the old configuration system to the new.
part III takes a look at configuring the “execution order”—the actions to execute during application startup and shutdown

Introduction

Registering with an IOC is all well and good, but something has to make calls into the IOC to get the ball rolling.

Something has to actually make calls into the IOC to get the ball rolling.

Even service applications—which start up quickly and wait for requests to do most of their work—have basic operations to execute before declaring themselves ready.

Things can get complex when starting up registered components and performing basic checks and non-IOC configuration.

In which order are the components and configuration elements executed?
How do you indicate dependencies?
How can an application replace a piece of the standard startup?
What kind of startup components are there?

Part of the complexity of configuration and startup is that developers quickly forget all of the things that they’ve come to expect from a mature product and start from zero again with each application. Encodo and Quino applications take advantage of prior work to include standard behavior for a lot of common situations.

Configuration Patterns

Some components can be configured once and directly by calling a method like UseMetaTranslations(string filePath), which includes all of the configuration options directly in the composition call. This pattern is perfect for options that are used only by one action or that it wouldn’t make sense to override in a subsequent action.

So, for simple actions, an application can just replace the existing action with its own, custom action. In the example above, an application for which translations had already been configured would just call UseMetaTranslations() again in order to override that behavior with its own.

Most application will replace standard actions or customize standard settings

Some components, however, will want to expose settings that can be customized by actions before they are used to initialize the component.

For example, there is an action called SetUpLoggingAction, which configures logging for the application. This action uses IFileLogSettings and IEventLogSettings objects from the IOC during execution to determine which types of logging to configure.

An application is, of course, free to replace the entire SetUpLoggingAction action with its own, completely custom behavior. However, an application that just wanted to change the log-file behavior or turn on event-logging could use the Configure() method [1], as shown below.

application.Configure(
  s => s.Behavior = LogFileBehavior.MultipleFiles
);
application.Configure(
  s => s.Enabled = true
);

Actions

A Quino application object has a list of StartupActions and a list of ShutdownActions. Most standard middleware methods register objects with the IOC and add one or more actions to configure those objects during application startup.

Actions have existed for quite a while in Quino. In Quino 2, they have been considerably simplified and streamlined to the point where all but a handful are little more than a functional interface [2].

The list below will give you an idea of the kind of configuration actions we’re talking about.

Load configuration data
Process command line
Set up logging
Upgrade settings/configuration (e.g. silent upgrade)
Log a header (e.g. user/date/file locations/etc.; for console apps. this might be mirrored to the console)
Load plugins
Set up standard locations (e.g. file-system locations)

For installed/desktop/mobile applications, there’s also:

Initialize UI components
Provide loading feedback
Check/manage multiple running instances
Check software update
Login/authentication

Quino applications also have actions to configure metadata:

Configure expression engine
Load metadata
Load metadata-overlays
Validate metadata
Check data-provider connections
Check/migrate schema
Generate default data

Application shutdown has a smaller set of vital cleanup chores that:

dispose of connection managers and other open resources
write out to the log, flush it and close it
show final feedback to the user

Anatomy of an Action

The following example [3] is for the 1.x version of the relatively simple ConfigureDisplayLanguageAction.

public class ConfigureDisplayLanguageAction 
  : ApplicationActionBase
  where TApplication : ICoreApplication
{
  public ConfigureDisplayLanguageAction()
    : base(CoreActionNames.ConfigureDisplayLanguage)
  {
  }

  protected override int DoExecute(
    TApplication application, ConfigurationOptions options, int currentResult)
  {
    // Configuration code…
  }
}

What is wrong with this startup action? The following list illustrates the main points, each of which is addressed in more detail in its own section further below.

The ConfigurationOptions parameter introduces an unnecessary layer of complexity
The generic parameter TApplication complicates declaration, instantiation and extension methods that use the action
The int return type along with the currentResult parameter are a bad way of controlling flow.

The same startup action in Quino 2.x has the following changes from the Quino 1.x version above (legend: additions; ~~deletions~~).

public class ConfigureDisplayLanguageAction
  : ApplicationActionBase
  where TApplication : ICoreApplication
{
  public ConfigureDisplayLanguageAction()
    : base(CoreActionNames.ConfigureDisplayLanguage)
  {
  }

  publicprotected override void int DoExecute(
    TApplication application, ConfigurationOptions options, int currentResult)
  {
    // Configuration code…
  }
}

As you can see, quite a bit of code and declaration text was removed, all without sacrificing any functionality. The final form is quite simple, inheriting from a simple base class that manages the name of the action and overrides a single parameter-less method. It is now much easier to see what an action does and the barrier to entry for customization is much lower.

public class ConfigureDisplayLanguageAction : ApplicationActionBase
{
  public ConfigureDisplayLanguageAction()
    : base(CoreActionNames.ConfigureDisplayLanguage)
  {
  }

  public override void Execute()
  {
    // Configuration code…
  }
}

In the following sections, we’ll take a look at each of the problems indicated above in more detail.

Remove the `ConfigurationOptions` parameter

These options are a simple enumeration with values like Client, Testing, Service and so on. They were used only by a handful of standard actions.

These options made it more difficult to decide how to implement the action for a given task. If two tasks were completely different, then a developer would know to create two separate actions. However, if two tasks were similar, but could be executed differently depending on application type (e.g. testing vs. client), then the developer could still have used two separate actions, but could also have used the configuration options. Multiple ways of doing the exact same thing is all kinds of bad.

Multiple ways of doing the exact same thing is all kinds of bad.

Parameters like this conflict conceptually with the idea of using composition to build an application. To keep things simple, Quino applications should be configured exclusively by composition. Composing an application with service registrations and startup actions and then passing options to the startup introduced an unneeded level of complexity.

Instead, an application now defines a separate action for each set of options. For example, most applications will need to set up the display language to use—be it for a GUI, a command-line or just to log messages in the correct language. For that, the application can add a ConfigureDisplayLanguageAction to the startup actions or call the standard method UseCore(). Desktop or single-user applications can use the ConfigureGlobalDisplayLanguageAction or call UseGlobalCore() to make sure that global language resources are also configured.

Remove the `TApplication` generic parameter

The generic parameter to this interface complicates the IApplication interface and causes no end of trouble in MetaApplication, which actually inherits from IApplication for historical reasons.

There is no need to maintain statelessness for a single-use object.

Originally, this parameter guaranteed that an action could be stateless. However, each action object is attached to exactly one application (in the IApplication.StartupActions list. So the action that is attached to an application is technically stateless, and a completely different application than the one to which the action is attached could be passed to the IApplcationAction.Execute…which makes no sense whatsoever.

Luckily, this never happens, and only the application to which the action is attached is passed to that method. If that’s the case, though, why not just create the action with the application as a constructor parameter when the action is added to the StartupActions list? There is no need to maintain statelessness for a single-use object.

This way, there is no generic parameter for the IApplication interface, all of the extension methods are much simpler and applications are free to create custom actions that work with descendants of IApplication simply by requiring that type in the constructor parameter.

Debugging is important

A global exception handler is terrible for debugging

The original startup avoided exceptions, preferring an integer return result instead.

In release mode, a global exception handler is active and is there to help the application exit more or less smoothly—e.g. by logging the error, closing resources where possible, and so on.

A global exception handler is terrible for debugging, though. For exceptions that are caught, the default behavior of the debugger is to stop where the exception is caught rather than where it is thrown. Instead, you want exceptions raised by your application to to stop the debugger from where they are thrown.

So that’s part of the reason why the startup and shutdown in 1.x used return codes rather than exceptions.

Multiple valid code paths

The other reason Quino used result codes is that most non-trivial applications actually have multiple paths through which they could successfully run.

Exactly which path the application should take depends on startup conditions, parameters and so on. Some common examples are:

Show command-line help
Migrate an application schema
Import, export or generate data

To show command-line help, an application execute its startup actions in order. It reaches the action that checks whether the user requested command-line help. This action processes the request, displays that help and then wants to smoothly exit the application. The “main” path—perhaps showing the user a desktop application—should no longer be executed.

Non-trivial applications have multiple valid run profiles.

Similarly, the action that checks the database schema determines that the schema in the data provider doesn’t match the model. In this case, it would like to offer the user (usually a developer) the option to update the schema. Once the schema is updated, though, startup should be restarted from the beginning, trying again to run the main path.

Use exceptions to indicate errors

Whereas the Quino 1.x startup addressed the design requirements above with return codes, this imposes an undue burden on implementors. There was also confusion as to when it was OK to actually throw an exception rather than returning a special code.

Instead, the Quino 2.x startup always uses exceptions to indicate errors. There are a few special types of exceptions recognized by the startup code that can indicate whether the application should silently—and successfully—exit or whether the startup should be attempted again.

Conclusion

There is of course more detail into which we could go on much of what we discussed in these three articles, but that should suffice for an overview of the Quino configuration library.

[1] If C# had them, that it is. See Java 8 for an explanation of what they are.

[2] This pattern is echoed in the latest beta of the ASP.NET libraries, as described in the article Strongly typed routing for ASP.NET MVC 6 with IApplicationModelConvention.

[3] Please note that formatting for the code examples has been adjusted to reduce horizontal space. The formatting does not conform to the Encodo C# Handbook.

Encodo's configuration library for Quino: part II

2015-05-17T17:45:20+02:00

Published by marco on 17. May 2015 17:45:20 (GMT-5)

Updated by marco on 19. Sep 2015 07:13:29 (GMT-5)

In this article, we’ll continue the discussion about configuration started in part I. We wrapped up that part with the following principles to keep in mind while designing the new system.

Consistency
Opt-in configuration
Inversion of Control
Configuration vs. Execution
Common Usage

Borrowing from ASP.NET vNext

Quino’s configuration inconsistencies and issues have been well-known for several versions—and years—but the opportunity to rewrite it comes only now with a major-version break.

Luckily for us, ASP.NET has been going through a similar struggle and evolution. We were able to model some of our terminology on the patterns from their next version. For example, ASP.NET has moved to a pattern where an application-builder object is passed to user code for configuration. The pattern there is to include middleware (what we call “configuration”) by calling extension methods starting with “Use”.

Quino has had a similar pattern for a while, but the method names varied: “Integrate”, “Add”, “Include”; these methods have now all been standardized to “Use” to match the prevailing .NET winds.

Begone configuration and feedback

Additionally, Quino used to make a distinction between an application instance and its “configuration”—the template on which an application is based. No more. Too complicated. This design decision, coupled with the promotion of a platform-specific “Feedback” object to first-level citizen, led to an explosion of generic type parameters. [1]

The distinction between configuration (template) and application (instance) has been removed. Instead, there is just an application object to configure.

The feedback object is now to be found in the service locator. An application registers a platform-specific feedback to use as it would any other customization.

[1] The CustomWinformFeedback in the Quino 1.x code at the end of this article provides a glaring example.

Hello service locator

ASP.NET vNext has made the service locator a first-class citizen. In ASP.NET, applications receive an IApplicationBuilder in one magic “Configure” method and receive an IServiceCollection in another magic “ConfigureServices” method.

In Quino 2.x, the application is in charge of creating the service container, though Quino provides a method to create and configure a standard one (SimpleInjector). That service locator is passed to the IApplication object and subsequently accessible there.

Services can of course be registered directly or by calling pre-packaged Middleware methods. Unlike ASP.NET vNext, Quino 2.x makes no distinction between configuring middleware and including the services required by that middleware.

Begone configuration hierarchy

Quino’s configuration library has its roots in a time before we were using an IOC container. The configuration was defined as a hierarchy of configuration classes that modeled the following layers.

A base implementation that makes only the most primitive assumptions about an application. For example, that it has a RunMode (“debug” or “release”) or an exit code or that it has a logging mechanism (e.g. IRecorder).
The “Core” layer comprises application components that are very common, but do not depend on Quino’s metadata.
And, finally, the “Meta” layer includes configuration for application components that extend the core with metadata-dependent versions as well as specific components required by Quino applications.

While these layers are still somewhat evident, the move to middleware packages has blurred the distinction between them. Instead of choosing a concrete configuration base class, an application now calls a handful of “Use” methods to indicate what kind of application to build.

There are, of course, still helpful top-level methods—e.g. UseCore() and UseMeta() methods—that pull in all of the middleware for the standard application types. But, crucially, the application is free to tweak this configuration with more granular calls to register custom configuration in the service locator.

This is a flexible and transparent improvement over passing esoteric parameters to monolithic configuration methods, as in the previous version.

An example: Configure a software updater

Just as a simple example, whereas a Quino 1.x standalone application would set ICoreConfiguration.UseSoftwareUpdater to true, a Quino 2.x application calls UseSoftwareUpdater(). Where a Quino 1.x Winform application would inherit from the WinformFeedback in order to return a customized ISoftwareUpdateFeedback, a Quino 2.x application calls UseSoftwareUpdateFeedback().

The software-update feedback class is defined below and is used by both versions.

public class CustomSoftwareUpdateFeedback
  : WinformSoftwareUpdateFeedback
{
  protected override ResponseType DoConfirmUpdate(TApplication application, …)
  {
    …
  }
}

That’s where the similarities end, though. The code samples below show the stark difference between the old and new configuration systems.

Quino 1.x

As explained above, Quino 1.x did not allow registration of a sub-feedback like the software-updater. Instead, the application had to inherit from the main feedback and override a method to create the desired sub-feedback.

class CustomWinformFeedback : WinformFeedback
{
  public virtual ISoftwareUpdateFeedback
  GetSoftwareUpdateFeedback()
    where TApplication : ICoreApplication
    where TConfiguration : ICoreConfiguration
    where TFeedback : ICoreFeedback
  {
    return new CustomSoftwareUpdateFeedback(this);
  }
}

var configuration = new CustomConfiguration()
{
  UseSoftwareUpdater = true
}

WinformDxMetaConfigurationTools.Run(
  configuration,
  app => new CustomMainForm(app), 
  new CustomWinformFeedback()
);

The method-override in the feedback was hideous and scared off a good many developers. not only that, the pattern was to use a magical, platform-specific WinformDxMetaConfigurationTools.Run method to create an application, run it and dispose it.

Quino 2.x

Software-update feedback-registration in Quino 2.x adheres to the principles outlined at the top of the article: it is consistent and uses common patterns (functionality is included and customized with methods named “Use”), configuration is opt-in, and the IOC container is used throughout (albeit implicitly with these higher-level configuration methods).

using (var application = new CustomApplication())
{
  application.UseMetaWinformDx();
  application.UseSoftwareUpdater();
  application.UseSoftwareUpdaterFeedback(new CustomSoftwareUpdateFeedback());
  application.Run(app => new CustomMainForm(app));
}

Additionally, the program has complete control over creation, running and disposal of the application. No more magic and implicit after-the-fact configuration.

What comes after configuration?

In the next and (hopefully) final article, we’ll take a look at configuring execution—the actions to execute during startup and shutdown. Registering objects in a service locator is all well and good, but calls into the service locator have to be made in order for anything to actually happen.

Keeping this system flexible and addressing standard application requirements is a challenging but not insurmountable problem. Stay tuned.

Encodo’s configuration library for Quino: part I

2015-04-10T15:36:06+02:00

Published by marco on 10. Apr 2015 15:36:06 (GMT-5)

In this article, I’ll continue the discussion about configuration improvements mentioned in the release notes for Quino 2.0-beta1. With beta2 development underway, I thought I’d share some more of the thought process behind the forthcoming changes.

Software Libraries

what sort of patterns integrate and customize the functionality of libraries in an application?

An application comprises multiple tasks, only some of which are part of that application’s actual domain. For those parts not in the application domain, software developers use libraries. A library captures a pattern or a particular way of doing something, making it available through an abstraction. These simplify and smooth away detail irrelevant to the application.

A runtime and its standard libraries provide many such abstractions: for reading/writing files, connecting to networks and so on. Third-party libraries provide others, like logging, IOC, task-scheduling and more.

Because Encodo’s been writing software for a long time, we have a lot of patterns that we’ve come up with for our applications. These libraries are split into two main groups:

Encodo.*: extensions to the .NET framework or third-party libraries that don’t depend on Quino metadata.
Quino.*: extensions to the .NET framework, third-party libraries or Encodo libraries that depend on Quino metadata.

A sort of “meta” library that lies on top of all of this is configuration and startup of applications that use these libraries. That is, what sort of patterns integrate and customize the functionality of libraries in an application?

Balancing K.I.S.S. and D.R.Y

Almost nowhere in an application is the balance between K.I.S.S. and D.R.Y. more difficult to maintain than in configuration and startup.

So if we already know all of that, why does Quino need a new configuration library?

As mentioned above, there is a lot of commonality between applications in this area. An application will definitely want to incorporate such common configuration from a library. Updates and improvements to that library will then be applied as for any other. This is a good thing.

However, an application will also want to be able to tweak almost any given facet of this shared configuration. That is: just keep the good parts, have those upgraded when they’re changed, but apply customization and extend functionality for the application’s domain. Easy, right?

It is here that a good configuration library will find just the right level of granularity for customization. Too coarse? Then an application ends up throwing out too much common configuration in order to customize a small part of it. Too fine? Then the configuration system is too verbose or complex and the application avoids using it.

Instead, a configuration system should establish clear patterns—optimally, just one—for how to apply customization.

The builder of the underlying configuration library has to consider the myriad situations that might face a library developer and distill those requirements to a common pattern.
The library developer needs to think about which parts an application might want to customize and think about how to expose them.

So if we already know all of that, then why does Quino need a new configuration library? Well…

History of Quino’s Configuration Library

It’s really easy to make things over-complicated and muddy. It’s really easy to end up growing several different kinds of extension systems over the years. Quino ended up with a generics-heavy API that made declaring new configuration components very wordy.

The core of Quino is the metadata definition for an application domain. That part has barely changed at all since we first wrote it lo so many years ago. We declared it to be our core business—the part that we are better than others at—the part we wanted to have under our own control. Our first draft [1] has held up remarkably well.

Many of the other components have undergone quite a bit of flux: changes in requirements and the components themselves as well as new development processes and patterns all contributed to change. Over time, various applications had different needs and made adjustments to a different iteration of the configuration library. We moved from supporting only single-threaded, single-user desktop applications to also supporting multi-user, multi-threaded services and web servers.

…we were left with an ugly configuration system that no-one wanted to extend or
use—so yet another would be invented.

For all of these different applications, we naturally wanted to maintain the common configuration where possible—but customizations for new platforms stretched the capabilities of the configuration library.

Customization would be made to a new version of that library, but applications that couldn’t be upgraded immediately forced backwards-compatibility and thus resulted in several different concurrent ways of configuring a particular facet of an application.

In order to keep things in one place, we ended up breaking the interface-separation rule. Dependencies started clumping drastically, but it was OK because nobody was trying to use one thing without the other ten. But it was hard to see what was going on; customization became a black box for all but one or two gurus. On and on it went, until we were left with an ugly configuration system that no-one wanted to extend or use—so yet another would be invented, ad-hoc. And so it went.

Principles for Quino 2.0 Configuration

With Quino 2.0, we examined the existing system and came up with a list of principles.

Consistency: there should be only be one way of customizing settings and components. When a developer asks how to change something, the answer should always be the same pattern. If not, there better be a damned good reason (see “Configuration vs. Execution” below).
Opt-in configuration: No more magic methods or base classes that automatically add components and settings in black boxes. Even if the application has to call one or two more methods, it’s better to be declarative than clever™.
Inversion of Control: Standardize configuration to use an IOC container or service locator wherever possible. Instead of clumping settings in configuration or application objects, create discrete settings and put them in the container. Make dependencies explicit (constructor parameters!) and resolved through the container wherever possible.
Configuration vs. Execution: Be very aware of the difference between the “configuration” phase and the “execution” phase. During configuration, the service locator is used in write-only mode; during execution, the service locator is in read-only mode. Code executed during configuration must rely only on explicit dependency-injection via constructor.
Common Usage: Establish a pattern for calling configuration methods, from least to most specific. E.g. call Quino’s base configuration methods before any application-specific customization. Establish patterns for how to configure a single startup action or how to create settings for a larger component that could be further customized in subsequent phases.

In the next part, we’ll take a look at some concrete examples and documentation for the new patterns. [2]

[1] To be fair, it wasn’t our first attempt at metadata. In one way or another, we’d been defining metadata structures for generic programming for more years than we’d be comfortable divulging. A h/t of course to Opus Software’s Atlas libraries—1 and 2—where many of us contributed. Also, I had experience with cross-platform, generic libraries in C++ stretching all the way back to the late 90s as well as the generalized/meta elements of the earthli WebCore. So it was more like the fourth or fifth shot at it, if we’re going to be honest—but at least we got it right. :-)

[2] In particular, I’ll add more detail about “Common Usage” for those who might feel I’ve left them hanging a bit in the last bullet point. Sorry ‘bout that. The day is only so long. See you next time…

Quino v2.0-beta1: Configuration, services and web

2015-03-28T23:26:29+01:00

Published by marco on 28. Mar 2015 23:26:29 (GMT-5)

The summary below describes major new features, items of note and breaking changes. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

These are the big ones that forced a major-version change.

Rewrote configuration and application API. (QNO-4666, QNO-4679, QNO-4659, QNO-4772, QNO-4663, QNO-4664, QNO-4360)
Data-driver architecture has been consolidated. All drivers, ADO, Mongo and Remote, now use the same base implementation, logic and optimization. (QNO-4461, QNO-2913, QNO-4683)
Rewrote schema migration to return a list of DDL commands (QNO-4732, QNO-4726, QNO-4727) Also improved schema-migration and mapping to database (QNO-4708, QNO-4709, QNO-4605, QNO-4725, QNO-4605, QNO-4728, QNO-4720, QNO-4728)
Updated the entire security and access-control and authentication API, including adding extensive support for tokens for both ASP.NET MVC and WebAPI. (QNO-4754, QNO-4757, QNO-4747>)
Renamed IMessageRecorder to IRecorder, IMessageStore to IInMemoryRecorder and consolidated IFilteredMessageRecorder to IRecorder. (QNO-4686, QNO-4696, QNO-4750, QNO-4557)

Some smaller, but important changes:

Added support for RunInTransaction attribute. Specify the attribute on any IMetaTestFixture to wrap a test or every test in a fixture in a transaction. (QNO-4682)
Shared connection manager is now disposed when an application is disposed. (QNO-4752)

Breaking changes

Oh yeah. You betcha. This is a major release and we’ve knowingly made a decision not to maintain backwards-compatibility at all costs. Good news, though, the changes to make are relatively straightforward and easy to make if you’ve got a tool like ReSharper that can update using statements automatically.

Namespace changes

As we saw in part I and part II of the guide to using NDepend, Quino 2.0 has unsnarled quite a few dependency issues. A large number of classes and interfaces have been moved out of the Encodo.Tools namespace. Many have been moved to Encodo.Core but others have been scattered into more appropriate and more specific namespaces.

This is one part of the larger changes, easily addressed by using ReSharper to Alt + Enter your way through the compile errors.

Logging changes

Another large change is in renaming IMessageRecorder to IRecorder and IMessageStore to IInMemoryRecorder. Judicious use of search/replace or just a bit of elbow grease will get you through these as well.

Configuration changes

Finally, probably the most far-reaching change is in merging IConfiguration into IApplication. In previous versions of Quino, applications would create a configuration object and pass that to a platform-dependent Quino Run() method. Some configuration was provided by the application and some by the platform-specific method.

The example for Quino 1.13.0 below comes from the JobVortex Winform application.

var configuration = new JobVortexConfiguration
{
  MainSettings = Settings.Default
};

configuration.Add(new JobVortexClientConfigurationPackage());

if (!string.IsNullOrEmpty(Settings.Default.DisplayLanguage))
{
  configuration.DisplayLanguage = new Language(Settings.Default.DisplayLanguage);
}

WinformDxMetaConfigurationTools.Run(
  configuration, 
  app => new MainForm(app)
);

In Quino 2.0, the code above has been rewritten as shown below.

using (IMetaApplication application = new JobVortexApplication())
{
  application.MainSettings = Settings.Default;
  application.UseJobVortexClient();

  if (!string.IsNullOrEmpty(Settings.Default.DisplayLanguage))
  {
    application.DisplayLanguage = new Language(Settings.Default.DisplayLanguage);
  }

  application.Run(app => new MainForm(app));
}

As you can see, instead of creating a configuration, the program creates an application object. Instead of using configuration packages mixed with extension methods named “Integrate”, “Configure” and so on, the new API uses “Use” everywhere. This should be comfortable for people familiar with the OWIN/Katana configuration pattern.

It does, however, mean that the IConfiguration, ICoreConfiguration and IMetaConfiguration don’t exist anymore. Instead, use IApplication, ICoreApplication and IMetaApplication Again, a bit of elbow grease will be needed to get through these compile errors, but there’s little to no risk or need for high-level decisions.

There are a lot of these prepackaged methods to help you create common kinds of applications:

UseCoreConsole() (a non-Quino application that uses the console)
UseMetaConsole() (a Quino application that uses the console)
UseCoreWinformDx() (a non-Quino application that uses Winform)
UseMetaWinformDx() (a Quino application that uses Winform)
UseReporting()
UseRemotingServer()
Etc.

I think you get the idea. Once we have a final release for Quino 2.0, we’ll write more about how to use this new pattern.

Looking ahead to 2.0 Final

This is still just an internal beta of the 2.0 final version. More changes are on the way, including but not limited to:

Remove IConfigurationPackage and standardize the configuration API to be named “Use” everywhere (QNO-4771)
GenericObject improvements (QNO-4761, QNO-4762)
Change compile location for all projects (QNO-4756)
Move a lot of properties from ICoreApplication and IMetaApplication to configuration objects in the service locator. Also improve use of and configuration of service locator (QNO-4659)
More improvements to the recorders and logging (QNO-4688)
Changes to how ORM objects and metadata are generated. (QNO-4506)
Separate Encodo and Quino assemblies into multiple, smaller assemblies (QNO-4376)

See you there!

C# 6 Features and C# 7 Design Notes

2015-03-13T08:59:09+01:00

Published by marco on 13. Mar 2015 08:59:09 (GMT-5)

Microsoft has recently made a lot of their .NET code open-source. Not only is the code for many of the base libraries open-source but also the code for the runtime itself. On top of that, basic .NET development is now much more open to community involvement.

In that spirit, even endeavors like designing the features to be included in the next version of C# are online and open to all: C# Design Meeting Notes for Jan 21, 2015 by Mads Torgerson (GitHub).

C# 6 Recap

You may be surprised at the version number “7”—aren’t we still waiting for C# 6 to be officially released? Yes, we are.

If you’ll recall, the primary feature added to C# 5 was support for asynchronous operations through the async/await keywords. Most .NET programmers are only getting around to using this rather far- and deep-reaching feature, to say nothing of the new C# 6 features that are almost officially available.

C# 6 brings the following features with it and can be used in the CTP versions of Visual Studio 2015 or downloaded from the Roslyn project (GitHub).

Some of the more interesting features of C# 6 are:

Auto-Property Initializers: initialize a property in the declaration rather than in the constructor or on an otherwise unnecessary local variable.
Out Parameter Declaration: An out parameter can now be declared inline with var or a specific type. This avoids the ugly variable declaration outside of a call to a Try* method.
Using Static Class: using can now be used with with a static class as well as a namespace. Direct access to methods and properties of a static class should clean up some code considerably.
String Interpolation: Instead of using string.Format() and numbered parameters for formatting, C# 6 allows expressions to be embedded directly in a string (á la PHP): e.g. “{Name} logged in at {Time}”
nameof(): This language feature gets the name of the element passed to it; useful for data-binding, logging or anything that refers to variables or properties.
Null-conditional operator: This feature reduces conditional, null-checking cruft by returning null when the target of a call is null. E.g. company.People?[0]?.ContactInfo?.BusinessAddress.Street includes three null-checks

Looking ahead to C# 7

If the idea of using await correctly or wrapping your head around the C# 6 features outlined above doesn’t already make your poor head spin, then let’s move on to language features that aren’t even close to being implemented yet.

That said, the first set of design notes for C# 7 by Mads Torgerson (GitHub) include several interesting ideas as well.

Pattern-matching: C# has been ogling its similarly named colleague F# for a while. One of the major ideas on the table for C# is improving the ability to represent as well as match against various types of pure data, with an emphasis on immutable data.
Metaprogramming: Another focus for C# is reducing boilerplate and capturing common code-generation patterns. They’re thinking of delegation of interfaces through composition. Also welcome would be an improvement in the expressiveness of generic constraints.

Related User Voice issues:
- Expand Generic Constraints for constructors
- [p]roper (generic) type ali[a]sing
Controlling Nullability: Another idea is to be able to declare reference types that can never be null at compile-time (where reasonable—they do acknowledge that they may end up with a “less ambitious approach”).
Readonly parameters and locals: Being able to express when change is allowed is a powerful form of expressiveness. C# 7 may include the ability to make local variables and parameters readonly. This will help avoid accidental side-effects.
Lambda capture lists: One of the issues with closures is that they currently just close over any referenced variables. The compiler just makes this happen and for the most part works as expected. When it doesn’t work as expected, it creates subtle bugs that lead to leaks, race conditions and all sorts of hairy situations that are difficult to debug.

If you throw in the increased use of and nesting of lambda calls, you end up with subtle bugs buried in frameworks and libraries that are nearly impossible to tease out.

The idea of this feature is to allow a lambda to explicitly capture variables and perhaps even indicate whether the capture is read-only. Any additional capture would be flagged by the compiler or tools as an error.
Contracts(!): And, finally, this is the feature I’m most excited about because I’ve been waiting for integrated language support for Design by Contract for literally decades [1], ever since I read the Object-Oriented Software Construction 2 (Amazon) (OOSC2) for the first time. The design document doesn’t say much about it, but mentions that “.NET already has a contract system”, the weaknesses of which I’ve written about before. Torgersen writes:

“When you think about how much code is currently occupied with arguments and result checking, this certainly seems like an attractive way to reduce code bloat and improve readability.”
…and expressiveness and provability!

There are a bunch of User Voice issues that I can’t encourage you enough to vote for so we can finally get this feature:
- Integrate Code Contracts more deeply in the .NET Framework
- Integrate Code Contract Keywords into the main .Net Languages

With some or all of these improvements, C# 7 would move much closer to a provable language at compile-time, an improvement over being a safe language at run-time.

We can already indicate that instance data or properties are readonly. We can already mark methods as static to prevent the use of this. We can use ReSharper [NotNull] attributes to (kinda) enforce non-null references without using structs and incurring the debt of value-passing and -copying semantics.

I’m already quite happy with C# 5, but if you throw in some or all of the stuff outlined above, I’ll be even happier. I’ll still have stuff I can think of to increase expressiveness—covariant return types for polymorphic methods or anchored types or relaxed contravariant type-conformance—but this next set of features being discussed sounds really, really good.

[1]

I love the features of the language Eiffel, but haven’t ever been able to use it for work. The tools and IDE are a bit stuck in the past (very dated on Windows; X11 required on OS X). The language is super-strong, with native support for contracts, anchored types, null-safe programming, contravariant type-conformance, covariant return types and probably much more that C# is slowly but surely including with each version. Unfair? I’ve been writing about this progress for years (from newest to oldest):

.NET 4.5.1 and Visual Studio 2013 previews are available in June 2013
A provably safe parallel language extension for C# in February 2013
Waiting for C# 4.0: A casting problem in C# 3.5 in October 2009
Microsoft Code Contracts: Not with a Ten-foot Pole in June 2009
Generics and Delegates in C# in May 2007
Wildcard Generics in November 2006 (this one was actually about Java)
An analysis of C# language design in April 2004
Static-typing for languages with covariant parameters in June 2003
What is .NET? in February 2002

Quino Data Driver architecture, Part III: The Pipeline

2015-03-07T08:11:14+01:00

Published by marco on 7. Mar 2015 08:11:14 (GMT-5)

In part I of these series, we discussed applications, which provide the model and data provider, and sessions, which encapsulate high-level data context. In part II, we covered command types and inputs to the data pipeline.

In this article, we’re going to take a look at the data pipeline itself.

Applications & Sessions
Command types & inputs
The Data Pipeline
Builders & Commands
Contexts and Connections
Sessions, resources & objects

Overview

Major Components of the Data
DriverThe primary goal of the data pipeline is, of course, to correctly execute each query to retrieve data or command to store, delete or refresh data. The diagram to the right shows that the pipeline consists of several data handlers. Some of these refer to data sources, which can be anything: an SQL database or a remote service. [1]

The name “pipeline” is only somewhat appropriate: A command can jump out anywhere in the pipeline rather than just at the opposite end. A given command will be processed through the various data handlers until one of them pronounces the command to be “complete”.

Command context: recap

In the previous parts, we learned that the input to the pipeline is an IDataCommandContext. To briefly recap, this object has the following properties:

Session: Defines the context within which to execute the command
Handler: Implements an abstraction for reading/writing values and flags to the objects (e.g. SetValue(IMetaProperty)); more detail on this later
Objects: The sequence of objects on which to operate (e.g. for save commands) or to return (e.g. for load commands)
ExecutableQuery: The query to execute when loading or deleting objects
MetaClass: The metadata that describes the root object in this command; more detail on this later as well

Handlers

Where the pipeline metaphor holds up is that the command context will always start at the same end. The ordering of data handlers is intended to reduce the amount of work and time invested in processing a given command.

Analyzers

The first stage of processing is to quickly analyze the command to handle cases where there is nothing to do. For example,

The command is to save or delete, but the sequence of Objects is empty
The command is to save or reload, but none of the objects in the sequence of Objects has changed
The command is to load data but the query restricts to a null value in the primary key or a foreign key that references a non-nullable, unique key.

It is useful to capture these checks in one or more analyzers for the following reasons,

All drivers share a common implementation for efficiency checks
Optimizations are applied independent of the data sources used
Driver code focuses on driver-specifics rather than general optimization

Caches

If the analyzer hasn’t categorically handled the command and the command is to load data, the next step is to check caches. For the purposes of this article, there are two things that affect how long data is cached:

If the session is in a transacted state, then only immutable data, data that was loaded before the transaction began or data loaded within that transaction can be used. Data loaded/saved by other sessions—possibly to global caches—is not visible to a session in a transaction with an isolationLevel stricter than RepeatableRead.
The metadata associated with the objects can include configuration settings that control maximum caching lifetime as well as an access-timeout. The default settings are good for general use but can be tweaked for specific object types.

Caches currently include the following standard handlers [2]:

The ValueListDataHandler returns immutable data. Since the data is immutable, it can be used independent of the transaction-state of the session in which the command is executed.
The SessionCacheDataHandler returns data that’s already been loaded or saved in this session, to avoid a call to a possibly high-latency back-end. This data is safe to use within the session with transactions because the cache is rolled back when a transaction is rolled back.

Data sources

If the analyzer and cache haven’t handled a command, then we’re finally at a point where we can no longer avoid a call to a data source. Data sources can be internal or external.

Databases

The most common type is an external database:

PostgreSql 8.x and higher (PostgreSql 9.x for schema migration)
Sql Server 2008 and higher (w/schema migration)
Mongo (no schema; no migration)
SQlite (not yet released)

Remoting

Another standard data source is the Quino remote application server, which provides a classic interface- and method-based service layer as well as mapping nearly the full power of Quino’s generalized querying capabilities to an application server. That is, an application can smoothly switch between a direct connection to a database to using the remoting driver to call into a service layer instead.

The remoting driver supports both binary and JSON protocols. Further details are also beyond the scope of this article, but this driver has proven quite useful for scaling smaller client-heavy applications with a single database to thin clients talking to an application server.

Custom/Aspect-based

And finally, there is another way to easily include “mini” data drivers in an application. Any metaclass can include an IDataHandlerAspect that defines its own data driver as well as its capabilities. Most implementations use this technique to bind in immutable lists of data. But this technique has also been used to load/save data from/to external APIs, like REST services. We can take a look at some examples in more detail in another article.

The mini data driver created for use with an aspect can relatively easily be converted to a full-fledged data handler.

Local evaluation

The last step in a command is what Quino calls “local evaluation”. Essentially, if a command cannot be handled entirely within the rest of the data pipeline—either entirely by an analyzer, one or more caches or the data source for that type of object—then the local analyzer completes the command.

What does this mean? Any orderings or restrictions in a query that cannot be mapped to the data source (e.g. a C# lambda is too complex to map to SQL) are evaluated on the client rather than the server. Therefore, any query that can be formulated in Quino can also be evaluated fully by the data pipeline—the question is only of how much of it can be executed on the server, where it would (usually) be more efficient to do so.

Please see the article series that starts with Optimizing data access for high-latency networks for specific examples.

In this article, we’ve learned a bit about the ways in which Quino retrieves and stores data using the data pipeline. In the next part, we’ll cover the topic “Builders & Commands”.

[1] E.g. Quino uses a ProtoBuf-like protocol to communicate with its standard application server.

[2] There is an open issue to Introduce a global cache for immutable objects or objects used not in a transaction.

Quino Data Driver architecture, Part II: Command types & inputs

2015-02-28T18:36:41+01:00

Published by marco on 28. Feb 2015 18:36:41 (GMT-5)

In part I, we discussed applications—which provide the model and data provider—and sessions—which encapsulate high-level data context.

In this article, we’re going to take a look at the command types & inputs

Applications & Sessions
Command types & inputs [1]
The Data Pipeline
Builders & Commands
Contexts and Connections
Sessions, resources & objects

Overview

Major Components of the Data
DriverBefore we can discuss how the pipeline processes a given command, we should discuss what kinds of commands the data driver supports and what kind of inputs the caller can pass to it. As you can well imagine, the data driver can be used for CRUD—to create, read, update and delete and also to refresh data.

In the top-right corner of the diagram to the right, you can see that the only input to the pipeline is an IDataCommandContext. This object comprises the inputs provided by the caller as well as command-specific state used throughout the driver for the duration of the command.

Command types

A caller initiates a command with either a query or an object graph, depending on the type of command. The following commands and inputs are supported:

Load: returns a cursor for the objects that match a query
Count: returns the number of objects that match a query
Save: saves an object graph
Reload: refreshes the data in an object graph
Delete: deletes an object graph or the objects that match a query

Queries

A query includes information about the data to return (or delete).

Metadata: The meta-class represents the type of the root object for the command. For example, a “person” or “company”.
Filtering: Filters restrict the objects to return. A filter can address properties of the root object, but also properties of objects related to the root object. A caller can query for people whose first names start with the letter “m”—FirstName %~ ‘m’ [2]—or the caller can find all people which belong to a company whose name starts with the letter “e”—Company.FirstName %~ ‘e’. The context for these expressions is naturally the meta-class mentioned above. Additionally, the metadata/model can also include default filters to include.
Ordering: Orderings that determine in which order the data is returned. Orderings are also specified with the expression language, but are usually simpler, like ordering first by LastName and then by FirstName. More complex expressions are supported—for example, you could use the expression “{LastName}, {FirstName}”, which sorts by a formatted string [3]—but be aware that many data stores have limited support for complex expressions in orderings. Orderings are ignored in a query when used to delete objects.

Queries are a pretty big topic and we’ve only really scratched the surface so far. Quino has its own query language—QQL—the specification for which weighs in at over 80 pages, but that’s a topic for another day.

Object graphs

An object graph consists of a sequence of root objects and the sub-objects available along relations defined in the metadata.

It’s actually simpler than it perhaps sounds.

Let’s use the example above: a person is related to a single company, so the graph of a single person will include the company as well (if the object is loaded and/or assigned). Additionally, the company defines a relation that describes the list of people that belong to it. The person=>company relationship is complementary to the company=>person relationship. We call person=>company a 1-1 relation, while company=>person is a 1-n relation.

The following code creates two new companies, assigns them to three people and saves everything at once.

var encodo = new Company { Name = "Encodo Systems AG" };
var other = new Company { Name = "Not Encodo" };
var people = new [] 
{
  new Person { FirstName = "John", LastName = "Doe", Company = other },
  new Person { FirstName = "Bob", LastName = "Smith", Company = encodo },
  new Person { FirstName = "Ted", LastName = "Jones", Company = encodo }
};

Session.Save(people);

The variable people above is an object graph. The variables encodo and other are also object graphs, but only to parts of the first one. From people, a caller can look up people[0].Company, which is other. The graph contains cycles, so people[0].Company.People[0].Company is also other. From encodo, the caller can get to other people in the same company, but not to people in the other company, for example, encodo.People[0] gets “Bob Smith” and encodo.People[0].Company.People[1] gets “Ted Jones”.

As with queries, object graphs are a big topic and are strongly bound to the kind of metadata available in Quino. Another topic for another day.

Determining Inputs

Phew. We’re almost to the point where we can create an IDataCommandContext to send into the data pipeline.

We have an IDataSession and know why we need it
We know what type of command we want to execute (e.g. “Load”)
We have either a query or an object graph

With those inputs, Quino has all it needs from the caller. A glance at the top-left corner of the diagram above shows us that Quino will determine an IMetaClass and an IMetaObjectHandler from these inputs and then use them to build the IDataCommandContext.

An IQuery has a MetaClass property, so that’s easy. With the meta-class and the requested type of object, the data driver checks a list of registered object-handlers and uses the first one that says it supports that type. If the input is an object graph, though, the object-handler is determined first and then the meta-class is obtained from the object-handler using a root object from the graph.

Most objects will inherit from GenericObject which implements the IPersistable interface required by the standard object handler. However, an application is free to implement an object handler for other base classes—or no base class at all, using reflection to get/set values on POCOs. That is, however, an exercise left up to the reader.

At this point, we have all of our inputs and can create the IDataCommandContext.

In the next part, we’ll take a look at the “Data Pipeline” through which this command context travels.

[1] You’ll notice, perhaps, that this topic is new to this article. I’m expanding the series as I go along, trying to provide enough information to understand the process while keeping the individual blog entries to a digestible size.

[2] “%~” is actually the case-insensitive begins-with operator. You can find out more about comparison operators in the Quino documentation. Browse to “Encodo Base Library” and then “Expressions”.

[3] For more information on how to use Quino’s unique take on interpolated strings, see the documentation in the footnote above.

Quino Data Driver architecture, Part I: Applications & Sessions

2015-02-21T08:02:16+01:00

Published by marco on 21. Feb 2015 08:02:16 (GMT-5)

One part of Quino that has undergone quite a few changes in the last few versions is the data driver. The data driver is responsible for CRUD: create, read, update and delete operations. One part of this is the ORM—the object-relational mapper—that marshals data to and from relational databases like PostgreSql, SQL Server and SQLite.

We’re going to cover a few topics in this series:

Applications & Sessions
The Data Pipeline
Builders & Commands
Contexts and Connections
Sessions, resources & objects

But first let’s take a look at an example to anchor our investigation.

Introduction

An application makes a request to the data driver using commands like Save() to save data and GetObject() or GetList() to get data. How are these high-level commands executed? Quino does an excellent job of shielding applications from the details but it’s still very interesting to know how this is achieved.

The following code snippet creates retrieves some data, deletes part of it and saves a new version.

using (var session = application.CreateSession())
{
  var people = session.GetList();
  people.Query.WhereEquals(Person.Fields.FirstName, "john");
  session.Delete(people);
  session.Save(new Person { FirstName = "bob", LastName = "doe" });
}

In this series, we’re going to answer the following questions…and probably many more.

Where does the data come from?
What kind of sources are supported? How?
Is at least some of the data cached?
Can I influence the cache?
What is a session? Why do I need one?
Wait…what is the application?

Let’s tackle the last two questions first.

Application

The application defines common configuration information. The most important bits for the ORM are as follows:

Model: The model is the central part of any Quino application. The model defines entities, their properties, relationships between entities and so on. Looking at the example above, the model will include a definition for a Person, which has at least the two properties LastName and FirstName. There is probably an entity named Company as well, with a one-to-many relationship to Person. As you can imagine, Quino uses this information to formulate requests to data stores that contain data in this format. [1] For drivers that support it, Quino also uses this information in order to create that underlying data schema. [2]
DataProvider: The data provider encapsulates all of the logic and components needed to map the model to data sources. This is the part of the process on which this series will concentrate.
ConfigurationData: The configuration data describes which parts of the model are connected to which parts of the data provider. The default is, of course, that the entire model is mapped to a single data source. However, even in that case, the configuration indicates which data source: Sql Server? PostgreSql? A remote application server (2nd tier)? With a high-level API as described above, all of these decisions can be made in the configuration rather than assumed throughout the application. Yes, this means that you can change your Quino application from a two-tier to a three-tier application with a single configuration change.

Sessions

So that’s the application. There is a single shared application for a process.

But in any non-trivial application—and any non-desktop application—we will have multiple data requests running, possibly in different threads of execution.

Each request in a web application is a separate data context. Changes made in one request should not affect any other request. Each request may be authenticated as a different user.
A remote application-server is very similar to a web application. It handles requests from multiple users. Since it’s generally the second layer, it will most likely have direct connections to one or more databases. In this case, it will probably be in charge of executing business logic, most likely in a database transaction. In that case, we definitely don’t want one request using the transaction context from another request.
Even a non-web client-side application may want to execute some logic in the background or in a separate thread. In those cases, we probably want to keep the data used there separate from the data or objects used to render the other parts of the application.

That’s where sessions come in. The session encapsulates a data context, which contains the following information:

Application: The application will, as described above, tell the session which model and data provider to use.
Current user: For those familiar with ASP.NET, this is very similar to the HttpContext.Current.User but generalized to be available in any Quino application. All data requests over a session are made in the context of this user.
Access control: The access control provides information about the security context of an application. An application generally uses the access control to perform authorization checks.
Cache: Each session also has its own cache. There are global caches, but those are for immutable data. The session’s cache is always available, even when using transactions.
ConnectionManager: Many external data sources have transactable/shared state in the form of a connection. As with data, connections can sometimes be shared between sessions and sometimes they can’t. The connection manager takes care of knowing all of that for you.

If we go back to the original code sample, we now know that creating a new session with CreateSession() creates a new data context, with its own user and its own data cache. Since we didn’t pass in any credentials, the session uses the default credentials for the application. [3] All data access made on that session is nicely shielded and protected from any data access made in other sessions (where necessary, of course).

So now we’re no closer to knowing how Quino works with data on our behalf, but we’ve taken the first step: we know all about one of the main inputs to the data driver, the session.

In the next part, we’ll cover the topic “The Data Pipeline”.

[1] The domain model is used for everything in a Quino application—not just the ORM and for schema-migration. We use the model to generate C# code like concrete ORM objects, metadata references (e.g. the Person.Fields.FirstName in the example), or view models, DTOs or even client-side TypeScript definitions. We also use the model to generate user interfaces—both for entire desktop-application interfaces but also for HTML helpers to build MVC views.

[2] See the article Schema migration in Quino 1.13 for more information on how that works.

[3]

This is code that you might use in a single-user application. In a server application, you would most likely just use the session that was created for your request by Quino. If an application wants to create a new session, but using the same user as an existing session, it would call:

var requestCredentials = requestSession.AccessControl.CurrentUser.CreateCredentials();
using (var session = application.CreateSession(requestCredentials))
{
  // Work with session
}

Are you ready for ReSharper 9? Not for testing, you aren't.

2015-02-11T07:11:51+01:00

Published by marco on 11. Feb 2015 07:11:51 (GMT-5)

We’ve been using ReSharper at Encodo since version 4. And we regularly use a ton of other software from JetBrains [1]—so we’re big fans.

How to Upgrade R#

As long-time users of ReSharper, we’ve become accustomed to the following pattern of adoption for new major versions:

EAP

Read about cool new features and improvements on the JetBrains blog
Check out the EAP builds page
Wait for star ratings to get higher than 2 out of 5
Install EAP of next major version
Run into issues/problems that make testing EAP more trouble than it’s worth
Re-install previous major version

RTM

Major version goes RTM
Install immediately; new features! Yay!
Experience teething problems in x.0 version
Go through hope/disappointment cycle for a couple of bug-fix versions (e.g. x.0.1, x.0.2)
Install first minor-version release immediately; stability! Yay!

This process can take anywhere from several weeks to a couple of months. The reason we do it almost every time is that the newest version of ReSharper almost always has a few killer features. For example, version 8 had initial TypeScript support. Version 9 carries with it a slew of support improvements for Gulp, TypeScript and other web technologies.

Unfortunately, if you need to continue to use the test-runner with C#, you’re in for a bumpy ride.

History of the Test Runner

Any new major version of ReSharper can be judged by its test runner. The test runner seems to be rewritten from the ground-up in every major version. Until the test runner has settled down, we can’t really use that version of ReSharper for C# development.

The 6.x and 7.x versions were terrible at the NUnit TestCase and Values attributes. They were so bad that we actually converted tests back from using those attributes. While 6.x had trouble reliably compiling and executing those tests, 7.x was better at noticing that something had changed without forcing the user to manually rebuild everything.

Unfortunately, this new awareness in 7.x came at a cost: it slowed editing in larger NUnit fixtures down to a crawl, using a tremendous amount of memory and sending VS into a 1.6GB+ memory-churn that made you want to tear your hair out.

8.x fixed all of this and, by 8.2.x was a model of stability and usefulness, getting the hell out of the way and reliably compiling, displaying and running tests.

The 9.x Test Runner

And then along came 9.x, with a whole slew of sexy new features that just had to be installed. I tried the new features and they were good. They were fast. I was looking forward to using the snazzy new editor to create our own formatting template. ReSharper seemed to be using less memory, felt snappier, it was lovely.

And then I launched the test runner.

And then I uninstalled 9.x and reinstalled 8.x.

And then I needed the latest version of DotMemory and was forced to reinstall 9.x. So I tried the test runner again, which inspired this post. [2]

So what’s not to love about the test runner? It’s faster and seems much more asynchronous. However, it gets quite confused about which tests to run, how to handle test cases and how to handle abstract unit-test base classes.

Just like 6.x, ReSharper 9.x can’t seem to keep track of which assemblies need to be built based on changes made to the code and which test(s) the user would like to run.

Abstract tests cannot be executedHere are the concrete test instances

To be fair, we have some abstract base classes in our unit fixtures. For example, we define all ORM query tests in multiple abstract test-fixtures and then create concrete descendants that run those tests for each of our supported databases. If I make a change to a common assembly and run the tests for PostgreSql, then I expect—at the very least—that the base assembly and the PostgreSql test assemblies will be rebuilt. 9.x isn’t so good at that yet, forcing you to “Rebuild All”—something that I’d no longer had to do with 8.2.x.

TestCases and the Unit Test Explorer

It’s the same with TestCases: whereas 8.x was able to reliably show changes and to make sure that the latest version was run, 9.x suffers from the same issue that 6.x and 7.x had: sometimes the test is shown as a single node without children and sometimes it’s shown with the wrong children. Running these tests results in a spinning cursor that never ends. You have to manually abort the test-run, rebuild all, reload the runner with the newly generated tests from the explorer and try again. This is a gigantic pain in the ass compared to 8.x, which just showed the right tests—if not in the runner, then at-least very reliably in the explorer.

Huge scrollbar in the unit-test explorerThe unit-test explorer likes to expand everything

And the explorer in 9.x! It’s a hyperactive, overly sensitive, eager-to-please puppy that reloads, refreshes, expands nodes and scrolls around—all seemingly with a mind of its own! Tests wink in and out of existence, groups expand seemingly at random, the scrollbar extends and extends and extends to accommodate all of the wonderful things that the unit-test explorer wants you to see—needs for you to see. Again, it’s possible that this is due to our abstract test fixtures, but this is new to 9.x. 8.2.x is perfectly capable of displaying our tests in a far less effusive and frankly hyperactive manner.

One last thing: output-formatting

Even the output formatting has changed in 9.x, expanding all CR/LF pairs from single-spacing to double-spacing. It’s not a deal-breaker, but it’s annoying: copying text is harder, reading stack traces is harder. How could no one have noticed this in testing?

Output formatting is double-spaced

Conclusion

The install/uninstall process is painless and supports jumping back and forth between versions quite well, so I’ll keep trying new versions of 9.x until the test runner is as good as the one in 8.2.x is. For now, I’m back on 8.2.3. Stay tuned.

[1]

In no particular order, we have used or are using:

DotMemory
DotTrace
DotPeek
DotCover
TeamCity
PHPStorm
WebStorm
PyCharm

[2] Although I was unable to install DotMemory without upgrading to ReSharper 9.x, I was able to uninstall ReSharper 9.x afterwards and re-install ReSharper 8.x.

The Road to Quino 2.0: Maintaining architecture with NDepend (part II)

2014-11-16T00:20:42+01:00

Published by marco on 16. Nov 2014 00:20:42 (GMT-5)

In the previous article, I explained how we were using NDepend to clean up dependencies and the architecture of our Quino framework. You have to start somewhere, so I started with the two base assemblies: Quino and Encodo. Encodo only has dependencies on standard .NET assemblies, so let’s start with that one.

The first step in cleaning up the Encodo assembly is to remove dependencies on the Tools namespace. There seems to be some confusion as to what belongs in the Core namespace versus what belongs in the Tools namespace.

There are too many low-level classes and helpers in the Tools namespace. Just as a few examples, I moved the following classes from Tools to Core:

BitTools
ByteTools
StringTools
EnumerableTools

The names kind of speak for themselves: these classes clearly belong in a core component and not in a general collection of tools.

Now, how did I decide which elements to move to core? NDepend helped me visualize which classes are interdependent.

Direct Dependencies

Enumerable tools depends on String ToolsWe see that EnumerableTools depends on StringTools. I’d just moved EnumerableTools to Encodo.Core to reduce dependence on Encodo.Tools. However, since StringTools is still in the Tools namespace, the dependency remains. This is how examining dependencies really helps clarify a design: it’s now totally obvious that something as low-level as StringTools belongs in the Encodo.Core namespace and not in the Encodo.Tools namespace, which has everything but the kitchen sink in it.

MessageTools and Encodo.Tools dependenciesAnother example in the same vein is shown to the left, where we examine the dependencies of MessageTools on Encodo.Tools. The diagram explains that the colors correspond to the two dependency directions. [1]

We would like the Encodo.Messages namespace to be independent of the Encodo.Tools namespace, so we have to consider either (A) removing the references to ExceptionTools and OperatingSystemTools from MessageTools or (B) moving those two dependencies to the Encodo.Core namespace.

Choice (A) is unlikely while choice (B) beckons with the same logic as the example above: it’s now obvious that tools like ExceptionTools and OperatingSystemTools belong in Encodo.Core rather than the kitchen-sink namespace.

Indirect Dependencies

Once you’re done cleaning up your direct dependencies, you still can’t just sit back on your laurels. Now, you’re ready to get started looking at indirect dependencies. These are dependencies that involve more than just two namespaces that use each other directly. NDepend displays these as red bounding blocks. The documentation indicates that these are probably good component boundaries, assuming that the dependencies are architecturally valid.

NDepend can only show you information about your code but can’t actually make the decisions for you. As we saw above, if you have what appear to be strange or unwanted dependencies, you have to decide how to fix them. In the cases above, it was obvious that certain code was just in the wrong namespace. In other cases, it may simply be a few bits of code are defined at too low a level.

Improper use of namespaces

For example, our standard practice for components is to put high-level concepts for the component at the Encodo. namespace. Then we would use those elements from sub-namespaces, like Encodo..Utils. However, we also ended up placing types that then used that sub-namespace in the upper-level namespace, like ComponentNameTools.SetUpEnvironment() or something like that. The call to SetUpEnvironment() references the Utils namespace which, in turn, references the root namespace. This is a direct dependency, but if another namespace comes between, we have an indirect dependency.

This happens quite quickly for larger components, like Encodo.Security.

The screenshots below show a high-level snapshot of the indirect dependencies in the Encodo assembly and then also a detail view, with all sub-namespaces expanded. The detail view is much larger but shows you much more information about the exact nature of the cycle. When you select a red bounding box, another panel shows the full details and exact nature of the dependency.

Dependency Cycles in Encodo (Overview)Dependency Cycles in Encodo (Detail)Encodo Dependency Cycles (Dense detail)

Base Camp Two: base library almost cleaned up

App.Core.* and remaining dependenciesConfiguration dependenciesAfter a bunch of work, I’ve managed to reduce the dependencies to a set of interfaces that are clearly far too dependent on many subsystems.

ICoreConfiguration: references configuration options for optional subsystems like the software updater, the login, the incident reporter and more
ICoreFeedback: references feedbacks for several optional processes, like software-update, logins and more
ICoreApplication: references both the core configuration and feedback

The white books for NDepend claim that “[t]echnically speaking, the task of merging the source code of several assemblies into one is a relatively light one that takes just a few hours.” However, this assumes that the code has already been properly separated into non-interdependent namespaces that correspond to components. These components can then relatively easily be extracted to separate assemblies.

The issue that I have above with the Encodo assembly is a thornier one: the interfaces themselves embody a pattern that is inherently non-decoupling. I need to change how the configuration and feedback work completely in order to decouple this code.

Roadmap for startup and configuration

To that end, I’ve created an issue in the issue-tracker for Quino, QNO-4659 [2], titled “Re-examine how the configuration, feedback and application work together”. The design of these components predates our introduction of a service locator, which means it’s much more tightly coupled (as you can see above).

After some internal discussion, we’ve decided to change the design of the Encodo and Quino library support for application-level configuration and state.

Merge the configuration and application

To date, the configuration has contained all of the information necessary to run an application. The configuration was more-or-less stateless and corresponded to the definition of an application, akin to how a class is the underlying stateless definition, while an object is an instance of that definition. In practice, though, we always use a single application per configuration and the distinction is irrelevant, for all practical purposes. This will simplify all referencing code, as we will no longer need to pass around an IApplication.

Move the feedback to the service locator

Instead of treating the feedback like a first-class citizen, with a direct reference on the application, make consumers use the service locator to retrieve an instance. This will remove the remaining generic argument in the definition of IApplication, leaving us with a base interface that is free of generic arguments.

Move specific configuration objects to the service locator

The specific sub-interfaces that introduce dependencies are as follows:

IncidentReporter
SoftwareUpdater
CommandSetManager
LocationManager
ConnectionSettingsManager

Any components that currently reference the properties on the ICoreConfiguration can use the service locator to retrieve an instance instead.

Move specific settings to sub-objects

The configuration object is not only dependent on sub-objects, but is also overloaded with individual settings that are only used by very few specific sub-components. These will also be extracted into interfaces and moved into the service locator.

ILoginConfiguration
ISoftwareUpdateConfiguration
IFileLogConfiguration

As you can see, while NDepend is indispensable for finding dependencies, it can—along with a good refactoring tool (we use ReSharper)—really only help you clean up the low-hanging fruit. While I started out trying to split assemblies, I’ve now been side-tracked into cleaning up an older and less–well-designed component—and that’s a very good thing.

There are some gnarly knots that will feel nearly unsolvable—but with a good amount of planning, those can be re-designed as well. As I mentioned in the previous article, though, we can do so only because we’re making a clean break from the 1.x version of Quino instead of trying to maintain backward compatibility.

It’s worth it, though: the new design already looks much cleaner and is much more easily explained to new developers. Once that rewrite is finished, the Encodo assembly should be clean and I’ll use NDepend to find good places to split up that rather large assembly into sensible sub-assemblies.

[1] There is a setting to turn off showing the green dependencies—where the row depends on the column—to make it easier to read the matrix. If you do that, though, you have to make sure to select the class from which you’re trying to remove dependencies in the column. For example, if class A and B are interdependent, but A should not rely on B, you should make sure A is showing in the column. You can then examine dependencies on row B—and then remove them. This works very nicely with both direct and indirect dependencies.

[2] This link is to the Quino issue tracker, which requires a login.

The Road to Quino 2.0: Maintaining architecture with NDepend (part I)

2014-11-12T22:23:25+01:00

Published by marco on 12. Nov 2014 22:23:25 (GMT-5)

Full disclosure

A while back—this last spring, I believe—I downloaded NDepend to analyze code dependencies. The trial license is fourteen days; needless to say, I got only one afternoon in before I was distracted by other duties. That was enough, however, to convince me that it was worth the $375 to continue to clean up Quino with NDepend.

I decided to wait until I had more time before opening my wallet. In the meantime, however, Patrick Smacchia of NDepend approached me with a free license if I would write about my experiences using NDepend on Encodo’s blog. I’m happy to write about how I used the tool and what I think it does and doesn’t do. [1]

History & Background

Quino's first commitWe started working on Quino in the fall of 2007. As you can see from the first commit, the library was super-small and comprised a single assembly.

Fast-forward seven years and Version 1.13 of Quino has 66 projects/assemblies. That’s a lot of code and it was long past time to take a look a more structured look at how we’d managed the architecture over the years.

I’d already opened a branch in our Quino repository called feature/dependencyChanges and checked in some changes at the beginning of July. Those changes had come as a result of the first time I used NDepend to find a bunch of code that was in the wrong namespace or the wrong assembly, architecturally speaking.

Sidebar: Keeping branches mergeable

I wasn’t able to continue using this branch, though, for the following reasons.

I got the hang of NDepend relatively quickly and got a bit carried away. Using ReSharper, I was able to make a lot of changes and fixes in a relatively short amount of time.
I checked in all of these changes in one giant commit.
I did this all five months ago.
There have been hundreds of subsequent commits on the master branch, many of which also include global refactoring and cleanup.
As a result of the above, merging master into feature/dependencyChanges is more trouble than it’s worth.

Release Methodology

With each Quino change and release, we try our hardest to balance backward-compatibility with maintainability and effort. If it’s easy enough to keep old functionality under an old name or interface, we do so.

We mark members and types obsolete so that users are given a warning in the compiler but can continue using the old code until they have time to upgrade. These obsolete members are removed in the next major or minor upgrade.

Developers who have not removed their references to obsolete members will at this point be greeted with compiler errors. In all cases, the user can find out from Quino’s release notes how they should fix a warning or error.

The type of high-level changes that we have planned necessitate that we make a major version-upgrade, to Quino 2.0. In this version, we have decided not to maintain backward-compatibility in the code with Obsolete attributes. However, where we do make a breaking change—either by moving code to new or different assemblies or by changing namespaces—we want to maintain a usable change-log for customers who make the upgrade. The giant commit that I’d made previously was not a good start.

Take Two

Since some of these changes will be quite drastic departures in structure, we want to come up with a plan to make merging from the master branch to the feature/dependencyChanges branch safer, quicker and all-around easier.

I want to include many of the changes I started in the feature/dependencyChanges branch, but would like to re-apply those changes in the following manner:

Split the giant commit into several individual commits, each of which encapsulates exactly one change; smaller commits are much easier to merge
Document breaking changes in the release notes for Quino 2.0
Blog about/document the process of using NDepend to clean up Quino [2]

So, now that I’m ready to start cleaning up Quino for version 2.0, I’ll re-apply the changes from the giant commit, but in smaller commits. At the same time, I’ll use NDepend to find the architectural breaks that caused me to make those changes in the first place and document a bit of that process.

Setting up the NDepend Project

I created an NDepend project and attached it to my solution. Version 1.13 of Quino has 66 projects/assemblies, of which I chose the following “core” assemblies to analyze.

Initial assemblies

I can change this list at any time. There are a few ways to add assemblies. Unfortunately, the option to “Add Assemblies from VS Solution(s)” showed only 28 of the 66 projects in the Quino solution. I was unable to determine the logic that led to the other 38 projects not being shown. When I did select the projects I wanted from the list, the assemblies were loaded from unexpected directories. For example, it added a bunch of core assemblies (e.g. Encodo.Imaging) from the src/tools/Quino.CodeGenerator/bin/ folder rather than the src/libraries/Encodo.Imaging/bin folder. I ended up just taking the references I was offered by NDepend and added references to Encodo and Quino, which it had not offered to add. [3]

The NDepend Dashboard

Let’s take a look at the initial NDepend Dashboard.

Initial NDepend Dashboard

There’s a lot of detail here. The initial impression of NDepend can be a bit overwhelming, I supposed, but you have to remember the sheer amount of interdependent data that it shows. As you can see on the dashboard, not only are there a ton of metrics, but those metrics are also tracked on a time-axis. I only have one measurement so far.

Any assemblies not included in the NDepend project are considered to be “third-party” assemblies, so you can see external dependencies differently than internal ones. There is also support for importing test-coverage data, but I haven’t tried that yet.

There are a ton of measurements in there, some of which interest me and others that don’t, or with which I disagree. For example, over 1400 warnings are in the Quino* assemblies because the base namespace—Encodo.Quino—doesn’t correspond to a file-system folder—it expects Encodo/Quino, but we use just Quino.

Another 200 warnings are to “Avoid public methods not publicly visible”, which generally means that we’ve declared public methods on internal, protected or private classes. The blog post Internal or public? by Eric Lippert (Fabulous adventures in coding) covered this adequately and came to the same conclusion that we have: you actually should make methods public if they are public within their scope.

There are some White Books about namespace and assembly dependencies that are worth reading if you’re going to get serious about dependencies. There’s a tip in there about turning off “Copy Local” on referenced assemblies to drastically increase compilation speed that we’re going to look into.

Dependencies and cycles

One of the white books explains how to use namespaces for components and how to “levelize” an architecture. This means that the dependency graph is acyclic—that there are no dependency cycles and that there are certainly no direct interdependencies. The initial graphs from the Encodo and Quino libraries show that we have our work cut out for us.

Encodo & Quino Dependency MatrixEncodo interdependenciesQuino interdependencies

The first matrix shows the high-level view of dependencies in the Encodo and Quino namespaces. Click the second and third to see some initial dependency issues within the Encodo and Quino assemblies.

That’s as far as I’ve gotten so far. Tune in next time for a look at how we managed to fix some of these dependency issues and how we use NDepend to track improvement over time.

[1] I believe that takes care of full disclosure.

[2] This is something I’d neglected to do before. Documenting this process will help me set up a development process where we use NDepend more regularly—more than every seven years—and don’t have to clean up so much code at once.

[3] After having read the recommendations in the NDepend White Book—Partitioning code base through .NET assemblies and Visual Studio projects (PDF)—it’s clear why this happens: NDepend recommends using a single /bin folder for all projects in a solution.

Optimizing compilation and execution for dynamic languages

2014-11-12T22:14:18+01:00

Published by marco on 12. Nov 2014 22:14:18 (GMT-5)

The long and very technical article Introducing the WebKit FTL JIT provides a fascinating and in-depth look at how a modern execution engine optimizes code for a highly dynamic language like JavaScript.

To make a long story short: the compiler(s) and execution engine optimize by profiling and analyzing code and lowering it to runtimes of ever decreasing abstraction to run as the least dynamic version possible.

A brief history lesson

What does it mean to “lower” code? A programming language has a given level of abstraction and expressiveness. Generally, the more expressive it is, the more abstracted it is from code that can actually be run in hardware. A compiler transforms or translates from one language to another.

When people started programming machines, they used punch cards. Punch cards did not require any compilation because the programmer was directly speaking the language that the computer understood.

The first layer of abstraction that most of us—older programmers—encountered was assembly language, or assembler. Assembly code still has a more-or-less one-to-one correspondence between instructions and machine-language codes but there is a bit of abstraction in that there are identifiers and op-codes that are more human-readable.

Procedural languages introduced more types of statements like loops and conditions. At the same time, the syntax was abstracted further from assembler and machine code to make it easier to express more complex concepts in a more understandable manner.

At this point, the assembler (which assembled instructions into machine op-codes) became a compile which “compiled” a set of instructions from the more abstract language. A compiler made decisions about how to translate these concepts, and could make optimization decisions based on registers, volatility and other settings.

In time, we’d graduated to functional, statically typed and/or object-oriented languages, with much higher levels of abstraction and much more sophisticated compilers.

Generally, a compiler still used assembly language as an intermediate format, which some may remember from their days working with C++ or Pascal compilers and debuggers. In fact, .NET languages are also compiled to IL—the “Intermediate Language”—which corresponds to the instruction set that the .NET runtime exposes. The runtime compiles IL to the underlying machine code for its processor, usually in a process called JIT—Just-In-Time compilation. That is, in .NET, you start with C#, for example, which the compiler transforms to IL, which is, in turn, transformed to assembler and then machine code by the .NET runtime.

Static vs. Dynamic compilation

A compiler and execution engine for a statically typed language can make assumptions about the types of variables. The set of possible types is known in advance and types can be checked very quickly in cases where it’s even necessary. That is, the statically typed nature of the language allows the compiler to reason about a given program without making assumptions. Certain features of a program can be proven to be true. A runtime for a statically typed language can often avoid type checks entirely. It benefits from a significant performance boost without sacrificing any runtime safety.

The main characteristic of a dynamic language like JavaScript is that variables do not have a fixed type. Generated code must be ready for any eventuality and must be capable of highly dynamic dispatch. The generated code is highly virtualized. Such a runtime will execute much more slowly than a comparable statically compiled program.

Profile-driven compilation

Enter the profile-driven compiler, introduced in WebKit. From the article,

“The only a priori assumption about web content that our engine makes is that past execution frequency of individual functions is a good predictor for those functions’ future execution frequency.”

Here a “function” corresponds to a particular overload of a set of instructions called with parameters with a specific set of types. That is, suppose a JavaScript function is declared with one parameter and is called once with a string and 100 times with an integer. WebKit considers this to be two function overloads and will (possibly) elect to optimize the second one because it is called much more frequently. The first overload will still handle all possible types, including strings. In this way, all possible code paths are still possible, but the most heavily used paths are more highly optimized.

“All of the performance is from the DFG’s type inference and LLVM’s low-level optimizing power. […]

“Profile-driven compilation implies that we might invoke an optimizing compiler while the function is running and we may want to transfer the function’s execution into optimized code in the middle of a loop; to our knowledge the FTL is the first compiler to do on-stack-replacement for hot-loop transfer into LLVM-compiled code.”

Depending on the level of optimization, the code contains the following broad sections:

Original: code that corresponds to instructions written by the author
Profiling: code to analyze which types actually appear in a given code path
Switching: code to determine when a function has been executed often enough to warrant further optimization
Bailout code to abandon an optimization level if any of the assumptions made at that level no longer apply

FTL Pipeline

While WebKit has included some form of profile-driven compilation for quite some time, the upcoming version is the first to carry the same optimization to LLVM-generated machine code.

I recommend reading the whole article if you’re interested in more detail, such as how they avoided LLVM compiler performance issues and how they integrated this all with the garbage collector. It’s really amazing how much that we take for granted the WebKit JS runtime treats as “hot-swappable”. The article is quite well-written and includes diagrams of the process and underlying systems.

Quino v1.13.0: Schema migration, remoting, services and web apps

2014-10-31T10:39:12+01:00

Published by marco on 31. Oct 2014 10:39:12 (GMT-5)

Updated by marco on 1. Nov 2014 08:44:53 (GMT-5)

The summary below describes major new features, items of note and breaking changes in Quino. The full list of issues is also available for those with access to the Encodo issue tracker.

Highlights

Data & Schema

Applications can now choose a base class for generated ORM objects. (QNO-3107)
Data driver: made various bug fixes and improvements. (QNO-4538, QNO-4554, QNO-4551)
Improved schema migration for fields and constraint-violation messages under SQL Server. (QNO-4490, QNO-4111, QNO-4582)
Improved the console migrator and APIs, input queries and exit-code handling for console applications in general. Also changed the default command from [R]efresh to [S]how differences. (QNO-4649, QNO-4646, QNO-4648, QNO-4650, QNO-4651, QNO-4615, QNO-4645, QNO-4616)

Remoting & services

Fixed several issues in the remoting driver (client and server parts). (QNO-4626, QNO-4630, QNO-4631, QNO-4388, QNO-4575, QNO-4629, QNO-4573, QNO-4625, QNO-4633, QNO-4575)
Added a runner for Windows services that allows debugging and shows logging output for applications that use the CoreServiceBase, which extends the standard .NET ServiceBase. The runner is available in the Encodo.Service assembly.

Web

Improved default and custom authentication in web applications and the remoting server. Also improved support for authorization for remote-method routes as well as MVC controllers.
Improved configuration, error-handling and stability of the HttpApplicationBase, especially in situations where the application fails to start. Error-page handling was also improved, including handling for Windows Event Log errors.
Improved appearance of the web-based schema migrator. (QNO-4559, QNO-4561, QNO-4563, QNO-4548, QNO-4487, QNO-4486, QNO-4488)

Winform

Data-provider statistics: improved the WinForm-based statistics form. (QNO-4231, QNO-4545, QNO-4546)
Standard forms: updated the standard WinForm about window and splash screen to use Encodo web-site CI. (QNO-4529)

System & Tools

Removed the dependency on the SmartWeakEvents library from Quino. (QNO-4645); the Quino and Encodo assemblies now no longer have any external dependencies.
Image handling: the Encodo and Quino libraries now use the Windows Imaging Components instead of System.Drawing. (QNO-4536)
Window 8.1: fixed culture-handling for en-US and de-CH that is broken in Windows 8.1. (QNO-4534, QNO-4553)
R# annotations have been added to the Encodo assembly. Tell R# to look in the Encodo.Core namespace to use annotations like NotNull and CanBeNull with parameters and results. (QNO-4508)
Generated code now includes a property that returns a ValueListObject for each enum property in the metadata. For example, for a property named State of type CoreState, the generated code includes the former properties for the enum and the foreign key backing it, but now also includes the ValueListObject property. This new property provides easy access to the captions.
```
public CoreState State { … }
public ValueListObject StateObject { … }
public int? CoreStateIdId { … }
```
Improved the nant fix command in the default build tools to fix the assembly name as well. The build tools are available in bin/tools/build. See the src/demo/Demo.build file for an example on how to use the Nant build scripts for your own solutions. To change the company name used by the “fix” command, for example, add the following task override:
```
  
  
```
Fixed the implementation of IntegrateRemotableMethods to avoid a race condition with remote methods. Also improved the stability of the DataProvider statistics. (QNO-4599)

Breaking changes

The generic argument TRight has been removed from all classes and interfaces in the Encodo.Security.* namespace. In order to fix this code, just remove the int generic parameter wherever it was used. For example, where before you used the interface IUser, you should now use IUser (QNO-4576).
The overridable method MetaAccessControl.DoGetAccessChecker() has been renamed to MetaAccessControl.GetAccessChecker().
Renamed the Encodo.ServiceLocator.SimpleInjector.dll to Encodo.Services.SimpleInjector.dll and Quino.ServiceLocator.SimpleInjector.dll to Quino.Services.SimpleInjector.dll Also changed the namespace Quino.ServiceLocator to Encodo.Quino.Services.
Renamed HttpApplicationBase.StartMetaApplication() to CreateAndStartUpApplication().
Classes may no longer contain properties with names that conflict with properties of IMetaReadable (e.g. Deleted, Persisted). The model will no longer validate until the properties have been renamed and the code regenerated. (QNO-4185)
Removed StandardIntRights with integer constants and replaced it with StandardRights with string constants.
The IAccessControl.Check() and other related methods now accept a sequence of string rights rather than integers.
IMetaConfiguration.ConfigureSession() has been deprecated. The method will still be called but may have undesired side-effects, depending on why it was overridden. The common use was to initialize a custom AccessControl for the session. Continuing to do so may overwrite the current user set by the default Winform startup. Instead, applications should use the IDataSessionAccessControlFactory and IDataSessionFactory to customize the data sessions and access controls returned for an application. In order to attach an access control, take care to only set your custom access control for sessions that correspond to your application model. [1]
```
internal class JobVortexDataSessionAccessControlFactory : DataSessionAccessControlFactory
{
  public override IAccessControl CreateAccessControl(IDataSession session)
  {
    if (session.Application.Model.MetaId == JobVortexModelGenerator.ModelGuid)
    {
      return new JobVortexAccessControl(session);
    }

    return base.CreateAccessControl(session);
  }
}
```
The default length of the UserModule.User.PasswordHash property has been increased from 100 characters to 1000. This default is more sensible for implementations that use much longer validations tokens instead of passwords. To avoid the schema migration, revert the change by setting the property default length back to 0 in your application model, after importing the security module, as shown below.
```
var securityModule = Builder.Include();      
securityModule.Elements.Classes.User.Properties[
  Encodo.Quino.Models.Security.Classes.SecurityUser.Fields.PasswordHash
].MaximumSize = 100;
```
Application.Credentials has been removed. To fix references, retrieve the IUserCredentialsManager from the service locator. For example, the following code returns the current user:
```
Session.Application.Configuration.ServiceLocator.GetInstance().Current
```
If your application uses the WinformMetaConfigurationTools.IntegrateWinformPackages() or WinformDxMetaConfigurationTools.IntegrateWinformDxPackages(), then the IDataSession.AccessControl.CurrentUser will continue to be set correctly. If not, add the SingleUserApplicationConfigurationPackage to your application’s configuration. The user in the remoting server will be set up correctly. Add the WebApplicationConfigurationPackage to web applications in order to ensure that the current user is set up correctly for each request. (QNO-4596)
IDataSession.SyncRoot has been removed as it was no longer needed or used in Quino itself. Sessions should not be used in multiple threads, so there is no need for a SyncRoot. Code that uses it should be reworked to use a separate session for each thread.
Moved IMetaApplication.CreateSession() to an extension method. Add Encodo.Quino.App to the using clauses to fix any compile errors.
Removed IMetaApplication.DataProvider; use IMetaApplication.Configuration.DataProvider instead. (QNO-4604)
The schema migration API has been completely overhauled. ISchemaChange and descendents has been completely removed. ISchemaAction is no longer part of the external API, although it is still used internally. The ISchemaChangeFactory has been renamed to ISchemaCommandFactory and, instead of creating change objects, which are then applied directly, returns ISchemaCommand objects, which can be either executed or transformed in some other way. IMigrateToolkit.GetActionFor() has also been replace with CreateCommands(), which mirrors the rest of the API by returning a sequence of commands to address a given ISchemaDifference. This release still has some commands that cannot be transformed to pure SQL, but the goal is to be able to generate pure SQL for a schema migration. (QNO-993, QNO-4579, QNO-4581, 4588, 4591, QNO-4594)
IMigrateSchemaAspect.Apply() has been removed. All aspects will have to be updated to implement GetCommands() instead, or to use one of the available base classes, like UpdateDataAspectBase or ConvertPropertyTypeSchemaAspect. The following example shows how to use the UpdateDataAspectBase to customize migration for a renamed property.
```
internal class ArchivedMigrationAspect : UpdateDataAspectBase
{
  public ArchivedMigrationAspect()
    : base("ArchivedMigrationAspect", DifferenceType.RenamedProperty, ChangePhase.Instead)
  {
  }

  protected override void UpdateData(IMigrateContext context, ISchemaDifference difference)
  {
    using (var session = context.CreateSession(difference))
    {
      session.ChangeAndSaveAll(UpdateArchivedFlag);
    }
  }

  private void UpdateArchivedFlag(Project obj)
  {
    obj.Archived = !obj.Archived;
  }
}
```
The base aspects should cover most needs; if your functionality is completely customized, you can easily pass your previous implementation of Apply() to a DelegateSchemaCommand and return that from your implementation of GetCommands(). See the implementation of UpdateDataAspectBase for more examples. (QNO-4580)
MetaObjectIdEqualityComparer can no longer be constructed directly. Instead, use MetaObjectIdEqualityComparer.Default.
Renamed MetaClipboardControlDx.UpdateColorSkinaware() to MetaClipboardControlDx.UpdateSkinAwareColors().
IMetaUnique.LogicalParent has been moved to IMetaBase. Since IMetaUnique inherits from IMetaBase, it is unlikely that code is affected (unless reflection or some other direct means was used to reference the property). (QNO-4586)
IUntypedMessage has been removed; the AssociatedObject formerly found there has been moved to IMessage.
ITypedMessage.AssociatedObject has been renamed to ITypedMessage.TypedAssociatedObject. (QNO-4647)
Renamed MetaObjectTools to MetaReadableTools.
Redefined the protected methods GenericObject.GetAsGuid() and GenericObject.GetAsGuidDefault as extension methods in MetaWritableTools.
IMetaFeedback.CreateGlobalContext() has been removed. Instead the IGlobalContext is created using the service locator.

[1] The schema migration creates a metadata model for your model—meta-metadata—and uses the Quino ORM to load data when importing a model from a database. If you aren’t careful, as shown in the code example, then you’ll attach your custom access control to the sessions created for the schema migration’s data-access, which will more than likely fail when it tries to load user data from a table that does not exist in that model.

Schema migration in Quino 1.13

2014-10-24T12:26:25+02:00

Published by marco on 24. Oct 2014 12:26:25 (GMT-5)

Quino is a metadata framework for .NET. It provides a means of defining an application-domain model in the form of metadata objects. Quino also provides many components and support libraries that work with that metadata to automate many services and functions. A few examples are an ORM, schema migration, automatically generated user interfaces and reporting tools.

The schema-migration tool

The component we’re going to discuss is the automated schema-migration for databases. A question that recently came up with a customer was: what do all of the options mean in the console-based schema migrator?

Here’s the menu you’ll see in the console migrator:

Advanced Options
(1) Show migration plan
(2) Show significant mappings
(3) Show significant mappings with unique ids
(4) Show all mappings
(5) Show all mappings with unique ids

Main Options
(R) Refresh status
(M) Migrate database
(C) Cancel

The brief summary is:

The only action that actually makes changes is (M)
Option (1) is the only advanced option you will every likely use; use this to show the changes that were detected

The other advanced options are more for debugging the migration recommendation if something looks wrong. In order to understand what that means, we need to know what the migrator actually does.

Schema migration overview

Provide the application model as input
Import a model from the database as input
Generate a mapping between the two models
Create a migration plan to update the database to reflect the application model
Generate a list of commands that can be applied to the database to enact the plan
Execute the commands against the database

The initial database-import and final command-generation parts of migration are very database-specific. The determination of differences is also partially database-specific (e.g. some databases do not allow certain features so there is no point in detecting a difference that cannot ever be repaired). The rest of the migration logic is database-independent.

Gathering data for migration

The migrator works with two models: the target model and a source model

The target model is provided as part of the application and is usually loaded from a core assembly. The source model is imported from the database schema by the “import handler”

Given these two models, the “mapping builder” creates a mapping. In the current implementation of Quino, there is no support for allowing the user to adjust mapping before a migration plan is built from it. However, it would be possible to allow the user to verify and possibly adjust the mapping. Experience has shown that this is not necessary. Anytime we thought we needed to adjust the mapping, the problem was instead that the target model had been configured incorrectly. That is, each time we had an unexpected mapping, it led us directly to a misconfiguration in the model.

The options to show mappings are used to debug exactly such situations. Before we talk about mapping, though, we should talk about what we mean by “unique ids”. Every schema-relevant bit of metadata in a Quino model is associated with a unique id, in the form of a Guid and called a “MetaId” in Quino.

Importing a model from a database

What happens during when the import handler generates a model?

The importer runs in two phases:

Extract the “raw model” from the database schema
Enhance the “raw model” with data pulled from the application-specific Quino metadata table in the same database

A Quino application named “demo” will have the following schema:

All modeled tables are named “demo__*”
The metadata table is named “demometadata__elementdescription”

The migrator reads the following information into a “raw model”

Tables => MetaClasses
Fields/Columns => MetaProperties
Indexes => MetaIndexes
Foreign Keys => MetaPaths

If there is no further information in the database, then the mapper will have to use the raw model only. If, however, the database was created or is being maintained by Quino, then there is additional information stored in the metadata table mentioned above. The importer enhanced the raw model with this information, in order to improve mapping and difference-recognition. The metadata table contains all of the Quino modeling information that is not reflected in a standard database schema (e.g. the aforementioned MetaId).

The data available in this table is currently:

SchemaIdentifier: the identifier used in the raw model/database schema
Identifier: the actual identifier of the metadata element that corresponds to the element identified by the SchemaIdentifier
MetaId: the unique id for the metadata element
ObjectType: the type of metadata (one of: class, property, index, path, model)
ParentMetaId: the unique id of the metadata element that is the logical parent of this one; only allowed to be empty for elements with ObjectType equal to “model”
Data: Custom data associated with the element, as key/value pairs
DataVersion: Identifies the format type of the “Data” element (1.0.0.0 corresponds to CSV)

For each schema element in the raw model, the importer does the following:

Looks up the data associated with that SchemaIdentifier and ObjectType (e.g. “punchclock__person” and “class”)
Updates the “Identifier”
Sets the “MetaId”
Loads the key/value pairs from the Data field and applies that data to the element

Generating a mapping

At this point, the imported model is ready and we can create a mapping between it and the application model. The imported model is called the source model while the application model is called the target model because we’re migrating the “source” to match the “target”.

We generate a mapping by iterating the target model:

Find the corresponding schema element in the source model using MetaIds [1]
If an element can be found, create a mapping for those two elements
If no element can be found, create a mapping with the target element. This will cause the element to be created in the database.
For all elements in the source model that have no corresponding element in the target model, create a mapping with only the source element. This will cause the element to be dropped from the database.

Creating a migration plan

The important decisions have already been made in the mapping phase. At this point, the migrator just generates a migration plan, which is a list of differences that must be addressed in order to update the database to match the target model.

If the mapping has a source and target element
- Create a difference if the element has been renamed
- Create a difference if the element has been altered (e.g. a property has a different type or is now nullable; an index has new properties or is no longer unique; etc.)
If the mapping has only a source, generate a difference that the element is unneeded and should be dropped.
If the mapping has only a target, generate a difference that the element is missing and should be created.

This is the plan that is shown to the user by the various migration tools available with Quino. [2]

The advanced console-migrator commands

At this point, we can now understand what the advanced console-migrator commands mean. Significant mappings are those mappings which correspond to a difference in the database (create, drop, rename or alter).

Show significant mappings: show significant mappings to see more detail about the names on each side
Show significant mappings with unique ids: same as above, but also include the MetaIds for each side. Use this to debug when you suspect that you might have copy/pasted a MetaId incorrectly or inadvertently moved one.
Show all mappings: Same detail level as the first option, but with all mappings, including those that are 100% matches
Show all mappings with unique ids: same as above, but with MetaIds

As already stated, the advanced options are really there to help a developer see why the migrator might be suggesting a change that doesn’t correspond to expectations.

Generating commands for the plan

At this point, the migrator displays the list of differences that will be addressed by the migrator if the user chooses to proceed.

What happens when the user proceeds? The migrator generates database-specific commands that, when executed against the database, will modify the schema of the database. [3]

Commands are executed for different phases of the migration process. The phases are occasionally extended but currently comprise the following.

Initialize: perform any required initialization before doing anything to the schema
DropConstraintsAndIndexes: drop all affected constraints and indexes that would otherwise prevent the desired modification of the elements involved in the migration.
AddUpdateOrRenameSchema: Create new tables, columns and indexes and perform any necessary renaming. The changes in this phase are non-destructive
UpdateData: Perform any necessary data updates before any schema elements are removed. This is usually the phase in which custom application code is executed, to copy existing data from other tables and fields before they are dropped in the next phase. For example, if there is a new required 1–1 relation, the custom code might analyze the other data in the rows of that table to determine which value that row should have for the new foreign key.
DropSchema: Drop any unneeded schema elements and data
CreatePrimaryKeys: Create primary keys required by the schema. This includes both new primary keys as well as reestablishing primary keys that were temporarily dropped in the second phase.
CreateConstraintsAndIndexes: Create constraints and indexes required by the schema. This includes both new constraints and indexes as well as reestablishing constraints and indexes that were temporarily dropped in the second phase.
UpdateMetadata: Update the Quino-specific metadata table for the affected elements.

Executing the migration plan

The commands are then executed and the results logged.

Afterward, the schema is imported again, to verify that there are no differences between the target model and the database. In some (always rarer) cases, there will still be differences, in which case, you can execute the new migration plan to repair those differences as well.

In development, this works remarkably well and often, without further intervention.

Fixing failed migrations

In some cases, there is data in the database that, while compatible with the current database schema, is incompatible with the updated schema. This usually happens when a new property or constraint is introduced. For example, a new required property is added that does not have a default value or a new unique index is added which existing data violates.

In these cases, there are two things that can be done:

Either the database data is cleaned up in a way that makes it compatible with the target schema [4]
Or the developer must add custom logic to the metadata elements involved. This usually means that the developer must set a default value on a property. In rarer cases, the developer must attach logic to the affected metadata (e.g. the property or index that is causing the issue) that runs during schema migration to create new data or copy it from elsewhere in order to ensure that constraints are satisfied when they are reestablished at the end of the migration.

In general, it’s strongly advised to perform a migration against a replica of the true target database (e.g. a production database) in order to guarantee that all potential data situations have been anticipated with custom code, if necessary.

Quino Migration versus EF Migrations

It’s important to point out that Quino’s schema migration is considerably different from that employed by EF (which it picked up from the Active Migrations in Ruby, often used with Ruby on Rails). In those systems, the developer generates specific migrations to move from one model version to another. There is a clear notion of upgrading versus downgrading. Quino only recognizes migrating from an arbitrary model to another arbitrary model. This makes Quino’s migration exceedingly friendly when moving between development branches, unlike EF, whose deficiencies in this area have been documented.

[1] The default is to use only MetaIds. There is a mode in which identifiers are used as a fallback but it is used only for tools that import schemas that were not generated by Quino. Again, if the Quino metadata table hasn’t been damaged, this strict form of mapping will work extremely well.

[2] The Winform and Web user interfaces for Quino both include built-in feedback for interacting with the schema migration. There are also two standalone tools to migrate database schemas: a Winform application and a Windows console application.

[3] The form of these commands is currently a mix of SQL and custom C# code. A future feature of the migration will be to have all commands available as SQL text so that the commands, instead of being executed directly, could be saved as a file and reviewed and executed by DBAs instead of letting the tool do it. We’re not quite there yet, but proceeding nicely.

[4] This is generally what a developer does with his or her local database. The data contained therein can usually be more or less re-generated. If there is a conflict during migration, a developer can determine whether custom code is necessary or can sometimes determine that the data situation that causes the problem isn’t something that comes up in production anyway and just remove the offending elements or data until the schema migration succeeds.

EF Migrations troubleshooting

2014-10-20T15:23:19+02:00

Published by marco on 20. Oct 2014 15:23:19 (GMT-5)

The version of EF Migrations discussed in this article is 5.0.20627. The version of Quino is less relevant: the features discussed have been supported for years. For those in a hurry, there is a tl;dr near the end of the article.

We use Microsoft Entity Framework (EF) Migrations in one of our projects where we are unable to use Quino. We were initially happy to be able to automate database-schema changes. After using it for a while, we have decidedly mixed feelings.

As developers of our own schema migration for the Quino ORM, we’re always on the lookout for new and better ideas to improve our own product. If we can’t use Quino, we try to optimize our development process in each project to cause as little pain as possible.

EF Migrations and branches

We ran into problems in integrating EF Migrations into a development process that uses feature branches. As long as a developer stays on a given branch, there are no problems and EF functions relatively smoothly. [1]

However, if a developer switches to a different branch—with different migrations—EF Migrations is decidedly less helpful. It is, in fact, quite cryptic and blocks progress until you figure out what’s going on.

Assume the following not-uncommon situation:

The project is created in the master branch
The project has an initial migration BASE
Developers A and B migrate their databases to BASE
Developer A starts branch feature/A and includes migration A in her database
Developer B starts branch feature/B and includes migration B in his database

We now have the situation in which two branches have different code and each has its own database schema. Switching from one branch to another with Git quickly and easily addresses the code differences. The database is, unfortunately, a different story.

Let’s assume that developer A switches to branch feature/B to continue working there. The natural thing for A to do is to call “update-database” from the Package Manager Console [2]. This yields the following message—all-too-familiar to EF Migrations developers.

EF Migrations pending-changes warning

“Unable to update database to match the current model because there are pending changes and automatic migration is disabled. Either write the pending changes to a code-based migration or enable automatic migration. […]”

This situation happens regularly when working with multiple branches. It’s even possible to screw up a commit within a single branch, as illustrated in the following real-world example.

Add two fields to an existing class
Generate a migration with code that adds two fields
Migrate the database
Realize that you don’t need one of the two fields
Remove the C# code from the migration for that field
Tests run green
Commit everything and push it

As far as you’re concerned, you committed a single field to the model. When your co-worker runs that migration, it will be applied, but EF Migrations immediately thereafter complains that there are pending model changes to make. How can that be?

Out-of-sync migrations != outdated database

Just to focus, we’re actually trying to get real work done, not necessarily debug EF Migrations. We want to answer the following questions:

Why is EF Migrations having a problem updating the schema?
How do I quickly and reliably update my database to use the current schema if EF Migrations refuses to do it?

The underlying reason why EF Migrations has problems is that it does not actually know what the schema of the database is. It doesn’t read the schema from the database itself, but relies instead on a copy of the EF model that it stored in the database when it last performed a successful migration.

That copy of the model is also stored in the resource file generated for the migration. EF Migrations does this so that the migration includes information about which changes it needs to apply and about the model to which the change can be applied.

If the model stored in the database does not match the model stored with the migration that you’re trying to apply, EF Migrations will not update the database. This is probably for the best, but leads us to the second question above: what do we have to do to get the database updated?

Generate a migration for those “pending changes”

The answer has already been hinted at above: we need to fix the model stored in the database for the last migration.

Let’s take a look at the situation above in which your colleague downloaded what you thought was a clean commit.

From the Package Manager Console, run add-migration foo to scaffold a migration for the so-called “pending changes” that EF Migrations detected. That’s interesting: EF Migrations thinks that your colleague should generate a migration to drop the column that you’d only temporarily added but never checked in.

That is, the column isn’t in his database, it’s not in your database, but EF Migrations is convinced that it was once in the model and must be dropped.

How does EF Migrations even know about a column that you added to your own database but that you removed from the code before committing? What dark magic is this?

The answer is probably obvious: you did check in the change. The part that you can easily remove (the C# code) is only half of the migration. As mentioned above, the other part is a binary chunk stored in the resource file associated with each migration. These BLOBS are stored in the table _MigrationHistory table in the database.

DB Migration History tableMigration data is stored as binary data

How to fix this problem and get back to work

Here’s the tl;dr: generate a “fake” migration, remove all of the C# code that would apply changes to the database (shown below) and execute update-database from the Package Manager Console.

An empty migration

This may look like it does exactly nothing. What actually happens is that it includes the current state of the EF model in the binary data for the last migration applied to the database (because you just applied it).

Once you’ve applied the migration, delete the files and remove them from the project. This migration was only generated to fix your local database; do not commit it.

Everything’s cool now, right?

Applying the fix above doesn’t mean that you won’t get database errors. If your database schema does not actually match the application model, EF will crash when it assumes fields or tables are available which do not exist in your database.

Sometimes, the only way to really clean up a damaged database—especially if you don’t have the code for the migrations that were applied there [3]—is to remove the misapplied migrations from your database, undo all of the changes to the schema (manually, of course) and then generate a new migration that starts from a known good schema.

Conclusions and comparison to Quino

The obvious answer to the complaint “it hurts when I do this” is “stop doing that”. We would dearly love to avoid these EF Migrations-related issues but developing without any schema-migration support is even more unthinkable.

We’d have to create upgrade scripts manually or would have to maintain scripts to generate a working development database and this in each branch. When branches are merged, the database-upgrade scripts have to be merged and tested as well. This would be a significant addition to our development process, has maintainability and quality issues and would probably slow us down even more.

And we’re certainly not going to stop developing with branches, either.

We were hoping to avoid all of this pain by using EF Migrations. That EF Migrations makes us think of going back to manual schema migration is proof that it’s not nearly as elegant a solution as our own Quino schema migration, which never gave us these problems.

Quino actually reads the schema in the database and compares that model directly against the current application model. The schema migrator generates a custom list of differences that map from the current schema to the desired schema and applies them. There is user intervention but it’s hardly ever really required. This is an absolute godsend during development where we can freely switch between branches without any hassle. [4]

Quino doesn’t recognize “upgrade” versus “downgrade” but instead applies “changes”. This paradigm has proven to be a much better fit for our agile, multi-branch style of development and lets us focus on our actual work rather than fighting with tools and libraries.

[1] EF Migrations as we use it is tightly bound to SQL Server. Just as one example, the inability of SQL Server to resolve cyclic cascade dependencies is in no way shielded by EF Migrations. Though the drawback originates in SQL Server, EF Migrations simply propagates it to the developer, even though it purports to provide an abstraction layer. Quino, on the other hand, does the heavy lifting of managing triggers to circumvent this limitation.

[2] As an aside, this is a spectacularly misleading name for a program feature. It should just be called “Console”.

[3] I haven’t ever been able to use the Downgrade method that is generated with each migration, but perhaps someone with more experience could explain how to properly apply such a thing. If that doesn’t work, the method outlined above is your only fallback.

[4] The aforementioned database-script maintenance or having only very discrete schema-update points or maintaining a database per branch and switching with configuration files or using database backups or any other schemes that end up distracting you from working.

An introduction to PowerShell

2014-09-14T16:09:45+02:00

Published by marco on 14. Sep 2014 16:09:45 (GMT-5)

On Wednesday, August 27th, Tymon gave the rest of Encodo [1] a great introduction to PowerShell. I’ve attached the presentation but a lot of the content was in demonstrations on the command-line.

Download the presentation
Unzip to a local folder
Open index.html in a modern web browser (Chrome/Opera/Firefox work the best; IE has some rendering issues)

We learned a few very interesting things:

PowerShell is pre-installed on every modern Windows computer
You can PowerShell to other machines (almost like ssh!)
Windows developers should definitely learn how to use PowerShell.
Unix administrators who have to work on Windows machines should definitely learn how to use PowerShell. The underlying functionality of the operating system is much more discoverable via command line, get-command and get-member than the GUI.
You should definitely install ConEmu
When running ConEmu, make sure that you start a PowerShell session rather than the default Cmd session.
If you’re writing scripts, you should definitely install and use the ISE, which is an IDE for PowerShell scripts with debugging, code-completion, lists of available commands and much better copy/paste than the standard console.
The PowerShell Language Reference v3 is a very useful and compact reference for beginners and even for more advanced users

ConEmu Setup

The easiest way to integrate PowerShell into your workflow is to make it eminently accessible by installing ConEmu. ConEmu is a Windows command-line with a tabbed interface and offers a tremendous number of power-user settings and features. You can tweak it to your heart’s content.

ConEmu in Quake modeI set mine up to look like the one that Tymon had in the demonstrations (shown on my desktop to the right).

Download ConEmu (CodePlex); I installed version 140814, the most recent version marked as “beta”. There is no official release yet, but the software is quite mature.
Install it and run it. I didn’t allow the Win + Num support because I know that I’d never use it. YMMV and you can always change your choice from the preferences.
Show the settings to customize your installation. There are a ton of settings, so I listed the ones I changed below.
Set the window size to something a bit larger than the standard settings, especially if you have a larger monitor. I use 120 × 40.
Choose the color scheme you want to use. I’m using the standard PowerShell colors but a lot of popular, darker schemes are also available (e.g. Monokai).
Check out the hotkeys and set them up accordingly. The only key I plan on using is the one to show ConEmu. On the Swiss-German keyboard, it’s Ctrl + ¨.
The default console is not transparent, but there are those of us who enjoy a bit of transparency. Again, YMMV. I turned it on and left the slider at the default setting.
And, finally, you can turn on Quake-style console mode to make it drop down from the top of your primary monitor instead of appearing in a free-floating window.

[1] and one former Encodo employee—hey Stephan!

Should you return null or an empty list?

2014-08-08T10:20:08+02:00

Published by marco on 8. Aug 2014 10:20:08 (GMT-5)

I’ve seen a bunch of articles addressing this topic of late, so I’ve decided to weigh in.

The reason we frown on returning null from a method that returns a list or sequence is that we want to be able to freely use these sequences or lists with in a functional manner.

It seems to me that the proponents of “no nulls” are generally those who have a functional language at their disposal and the antagonists do not. In functional languages, we almost always return sequences instead of lists or arrays.

In C# and other functional languages, we want to be able to do this:

var names = GetOpenItems()
  .Where(i => i.OverdueByTwoWeeks)
  .SelectMany(i => i.GetHistoricalAssignees()
    .Select(a => new { a.FirstName, a.LastName })
  );

foreach (var name in names)
{
  Console.WriteLine("{1}, {0}", name.FirstName, name.LastName);
}

If either GetHistoricalAssignees() or GetOpenItems() might return null, then we’d have to write the code above as follows instead:

var openItems = GetOpenItems();
if (openItems != null)
{
  var names = openItems
  .Where(i => i.OverdueByTwoWeeks)
  .SelectMany(i => (i.GetHistoricalAssignees() ?? Enumerable.Empty())
    .Select(a => new { a.FirstName, a.LastName })
  );

  foreach (var name in names)
  {
    Console.WriteLine("{1}, {0}", name.FirstName, name.LastName);
  }
}

This seems like exactly the kind of code we’d like to avoid writing, if possible. It’s also the kind of code that calling clients are unlikely to write, which will lead to crashes with NullReferenceExceptions. As we’ll see below, there are people that seem to think that’s perfectly OK. I am not one of those people, but I digress.

The post, Is it Really Better to ‘Return an Empty List Instead of null’? / Part 1 by Christian Neumanns (Code Project) serves as a good example of an article that seems to be providing information but is just trying to distract people into accepting it as a source of genuine information. He introduces his topic with the following vagueness.

“If we read through related questions in Stackoverflow and other forums, we can see that not all people agree. There are many different, sometimes truly opposite opinions. For example, the top rated answer in the Stackoverflow question Should functions return null or an empty object? (related to objects in general, not specifically to lists) tells us exactly the opposite:

“Returning null is usually the best idea …”

The statement “we can see that not all people agree” is a tautology. I would split the people into groups of those whose opinions we should care about and everyone else. The statement “There are many different, sometimes truly opposite opinions” is also tautological, given the nature of the matter under discussion—namely, a question that can only be answered as “yes” or “no”. Such questions generally result in two camps with diametrically opposed opinions.

As the extremely long-winded pair of articles writes: sometimes you can’t be sure of what an external API will return. That’s correct. You have to protect against those with ugly, defensive code. But don’t use that as an excuse to produce even more methods that may return null. Otherwise, you’re just part of the problem.

The second article Is it Really Better to ‘Return an Empty List Instead of null’? − Part 2 by Christian Neumanns (Code Project) includes many more examples.

I just don’t know what to say about people that write things like “Bugs that cause NullPointerExceptions are usually easy to debug because the cause and effect are short-distanced in space (i.e. location in source code) and time.” While this is kind of true, it’s also even more true that you can’t tell the difference between such an exception being caused by a savvy programmer who’s using it to his advantage and a non-savvy programmer whose code is buggy as hell.

He has a ton of examples that try to distinguish between a method that returns an empty sequence being different from a method that cannot properly answer a question. This is a concern and a very real distinction to make, but the answer is not to return null to indicate nonsensical input. The answer is to throw an exception.

The method providing the sequence should not be making decisions about whether an empty sequence is acceptable for the caller. For sequences that cannot logically be empty, the method should throw an exception instead of returning null to indicate “something went wrong”.

A caller may impart semantic meaning to an empty result and also throw an exception (as in his example with a cycling team that has no members). If the display of such a sequence on a web page is incorrect, then that is the fault of the caller, not of the provider of the sequence.

If data is not yet available, but should be, throw an exception
If data is not available but the provider isn’t qualified to decide, return an empty sequence
If the caller receives an empty sequence and knows that it should not be empty, then it is responsible for indicating an error.

That there exists calling code that makes assumptions about return values that are incorrect is no reason to start returning values that will make calling code crash with a NullPointerException.

All of his examples are similar: he tries to make the pure-data call to retrieve a sequence of elements simultaneously validate some business logic. That’s not a good idea. If this is really necessary, then the validity check should go in another method.

The example he cites for getting the amount from a list of PriceComponents is exactly why most aggregation functions in .NET throw an exception when the input sequence is empty. But that’s a much better way of handling it—with a precise exception—than by returning null to try to force an exception somewhere in the calling code.

But the upshot for me is: I am not going to write code that, when I call it, forces me to litter other code with null-checks. That’s just ridiculous.

Optimizing data access for high-latency networks: part IV

2014-08-08T10:20:05+02:00

Published by marco on 8. Aug 2014 10:20:05 (GMT-5)

In the previous two articles, we managed to reduce the number of queries executed when opening the calendar of Encodo’s time-tracking product Punchclock from one very slow query per person to a single very fast query.

Because we’re talking about latency in these articles, we’d also like to clear away a few other queries that aren’t related to time entries but are still wasting time.

Lazy-loading unneeded values

In particular, the queries that “Load values” for person objects look quite suspicious. These queries don’t take a lot of time to execute but they will definitely degrade performance in high-latency networks. [1]

Lazy-loads for people are very fast queries

As we did before, we can click on one of these queries to show the query that’s being loaded. In the screenshot below, we see that the person’s picture is being loaded for each person in the drop-down list.

Lazy-loading pictures

We’re not showing pictures in the drop-down list, though, so this is an extravagant waste of time. On a LAN, we hardly notice how wasteful we are with queries; on a WAN, the product will feel…sluggish.

What is a load-group?

In order to understand the cause of these queries, you must first know that Quino allows a developer to put metadata properties into different load-groups. A load-group has the following behavior: If the value for a property in a load-group is requested on an object, the values for all of the properties in the load-group are retrieved with a single query and set on that object.

The default load-group of an object’s metadata determine the values that are initially retrieved and applied to objects materialized by the ORM.

The metadata for a person puts the “picture” property of a person into a separate load-group so that the value is not loaded by default when people objects are loaded from the data driver. This is a good balance because business logic will avoid downloading a lot of unwanted picture data by default.

Business logic that needs the pictures can either explicitly include the picture in the query or let the value be lazy-loaded by the ORM when it is accessed. The proper solution depends on the situation.

Lazy-loaded property values

As before, we can check the stack trace of the query to figure out which application component is triggering the call. In this case, the culprit is the binding list that we are using to attach the list of people to the drop-down control.

The binding list binds the values for all of the properties in a metaclass (e.g. “person”), triggering a lazy load when it accesses the “picture” property. To avoid the lazy-load, we can create a wrapper of the default metadata for a person and remove/hide the property so that the binding list will no longer access it.

This is quite easy [2], as shown in the code below.

var personMetaClass = new WrapMetaClass(Person.Metadata);
personMetaClass.Properties.Remove(Person.MetaProperties.Picture);
var query = new Query(personMetaClass);

With this simple fix, the binding list no longer knows about the picture property, doesn’t retrieve values for that property and therefore no longer triggers any queries to lazily load the pictures from the database for each person object.

The screenshot of the statistics window below shows up that we were successful. We have two main queries: one for the list of people to show in the dropdown control and one for the time entries to show in the calendar.

Just a single query for person data

Final version

For completeness, here’s the code that Punchclock is using in the current version of Quino (1.11).

var personMetaClass = new WrapMetaClass(Person.Metadata);
personMetaClass.Properties.Remove(Person.MetaProperties.Picture);

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var query = new Query(personMetaClass);
query.CustomCommandText = new CustomCommandText();
query.CustomCommandText.SetSection(
  CommandTextSections.Where, 
  CommandTextAction.Replace,
  string.Format(
    "EXISTS (SELECT id FROM {0} WHERE {1} = {2})", 
    accessToolkit.GetName(TimeEntry.Metadata), 
    accessToolkit.GetField(TimeEntry.MetaProperties.PersonId), 
    accessToolkit.GetField(Person.MetaProperties.Id)
  )>
);
var people = Session.GetList(query);

Future, improved version

Once we fix the but in the WhereExists join type mentioned in the previous article and add the fluent methods for constructing wrappers mentioned in the footnote below, the code will be as follows:

var personMetaClass = 
  Person.Metadata.
  Wrap().
  RemoveProperty(Person.MetaProperties.Picture);

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var people = 
  Session.GetList(
    new Query(personMetaClass).
    Join(Person.MetaRelations.TimeEntries, JoinType.WhereExists).
    Query
  );

This concludes our investigation into performance issues with Quino and Punchclock.

[1]

You may have noticed that these calls to “load values” are technically lazy-loaded but don’t seem to be marked as such in the screenshots. This was a bug in the statistics viewer that I discovered and addressed while writing this article.

Calls to 'load values' now marked as lazy loads

[2]

This is a rather old API and hasn’t been touched with the “fluent” wand that we’ve applied in other parts of the Quino API. A nicer way of writing it would be to create an extension methods called Wrap() and RemoveProperty that return the wrapper class, like so:

var personMetaClass = 
  Person.Metadata.
  Wrap().
  RemoveProperty(Person.MetaProperties.Picture);

var query = new Query(personMetaClass);

But that will have to wait for a future version of Quino.

Optimizing data access for high-latency networks: part III

2014-07-04T09:09:05+02:00

Published by marco on 4. Jul 2014 09:09:05 (GMT-5)

In the previous article, we partially addressed a performance problem in the calendar of Encodo’s time-tracking product, Punchclock. While we managed to drastically reduce the amount of time taken by each query (>95% time saved), we were still executing more queries than strictly necessary.

The query that we’re trying to optimized further is shown below.

var people =
  Session.GetList().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

This query executes one query to get all the people and then one query per person to get the number of time entries per person. Each of these queries by itself is very fast. High latency will cause them to be slow. In order to optimize further, there’s really nothing for it but to reduce the number of queries being executed.

Let’s think back to what we’re actually trying to accomplish: We want to get all people who have at least one time entry. Can’t we get the database to do that for us? Some join or existence check or something? How about the code below?

var people = 
  Session.GetList(
    Session.CreateQuery().
    Join(Person.MetaRelations.TimeEntries, JoinType.WhereExists).
    Query
  );

What’s happening in the code above? We’re still getting a list of people but, instead of manipulating the related TimeEntries for each person locally, we’re joining the TimeEntries relation with the Quino query Join() method and changing the join type from the default All to the restrictive WhereExists. This sounds like exactly what we want to happen! There is no local evaluation or manipulation with Linq and, with luck, Quino will be able to map this to a single query on the database.

This is the best possible query: it’s purely declarative and will be executed as efficiently as the back-end knows how.

There’s just one problem: the WhereExists join type is broken in Quino 1.11.

Never fear, though! We can still get it to work, but we’ll have to do a bit of work until the bug is fixed in Quino 1.12. The code below builds on lessons learned in the earlier article, Mixing your own SQL into Quino queries: part 2 of 2 to use custom query text to create the restriction instead of letting Quino do it.

var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;

var query = Session.CreateQuery();
query.CustomCommandText = new CustomCommandText();
query.CustomCommandText.SetSection(
  CommandTextSections.Where, 
  CommandTextAction.Replace,
  string.Format(
    "EXISTS (SELECT id FROM {0} WHERE {1} = {2})", 
    accessToolkit.GetName(TimeEntry.Metadata), 
    accessToolkit.GetField(TimeEntry.MetaProperties.PersonId), 
    accessToolkit.GetField(Person.MetaProperties.Id)
  )
);
var people = Session.GetList(query);

A look at the statistics is very encouraging:

One query for people; one for time-entries

We’re down to one 29ms query for the people and an even quicker query for all the relevant time entries. [1] We can see our query text appears embedded in the SQL generated by Quino, just as we expected.

There are a few other security-related queries that execute very quickly and hardly need optimization.

We’ve come much farther in this article and we’re almost done. In the next article, we’ll quickly clean up a few other queries that are showing up in the statistics and that have been nagging us since the beginning.

[1] The time-entry query is not representative because my testing data set didn’t include time entries for the current day and I was too lazy to page around to older data.

Optimizing data access for high-latency networks II

2014-06-27T10:07:40+02:00

Published by marco on 27. Jun 2014 10:07:40 (GMT-5)

In the previous article, we discussed a performance problem in the calendar of Encodo’s time-tracking product, Punchclock.

Instead of guessing at the problem, we profiled the application using the database-statistics window available to all Quino applications. [1] We quickly discovered that most of the slowdown stems from the relatively innocuous line of code shown below.

var people = 
  Session.GetList().
  Where(p => p.TimeEntries.Any()).
  ToList();

First things first: what does the code do?

Before doing anything else, we should establish what the code does. Logically, it retrieves a list of people in the database who have recorded at least one time entry.

The first question we should ask at this point is: does the application even need to do this? The answer in this case is ‘yes’. The calendar includes a drop-down control that lets the user switch between the calendars for different users. This query returns the people to show in this drop-down control.

With the intent and usefulness of the code established, let’s dissect how it is accomplishing the task.

The Session.GetList() portion retrieves a list of all people from the database
The Where() method is applied locally for each object in the list [2]
For a given person, the list of TimeEntries is accessed
This access triggers a lazy load of the list
The Any() method is applied to the full list of time entries
The ToList() method creates a list of all people who match the condition

Though the line of code looks innocuous enough, it causes a huge number of objects to be retrieved, materialized and retained in memory—simply in order to check whether there is at least one object.

This is a real-world example of a performance problem that can happen to any developer. Instead of blaming the developer who wrote this line of code, its more important to stay vigilant to performance problems and to have tools available to quickly and easily find them.

Stop creating all of the objects

The first solution I came up with [3] was to stop creating objects that I didn’t need. A good way of doing this and one that was covered in Quino: partially-mapped queries is to use cursors instead of lists. Instead of using the generated list TimeEntries, the following code retrieves a cursor on that list’s query and materializes at most one object for the sub-query.

var people = Session.GetList().Select(p =>
{
  using (var cursor = Session.CreateCursor(p.TimeEntries.Query)) [4]
  {
    return cursor.Any();
  }
}).ToList();

A check of the database statistics shows improvement, as shown below.

Time-entry queries with cursors

Just by using cursors, we’ve managed to reduce the execution time for each query by about 75%. [5] Since all we’re interested in finding out is whether there is at least one time entry for a person, we could also ask the database to count objects rather than to return them. That should be even faster. The following code is very similar to the example above but, instead of getting a cursor based on the TimeEntries query, it gets the count.

var people =
  Session.GetList().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

How did we do? A check of the database statistics shows even more improvement, as shown below.

Time-entry queries with COUNTs instead of SELECTs

We’re now down to a few dozen milliseconds for all of our queries, so we’re done, right? A 95% reduction in query-execution time should be enough.

Unfortunately, we’re still executing just as many queries as before, even though we’re taking far less time to execute them. This is better, but still not optimal. In high-latency situations, the user is still likely to experience a significant delay when opening the calendar since each query’s execution time is increased by the latency of the connection. In a local network, the latency is negligible; on a WAN, we still have a problem.

In the next article, we’ll see if we can’t reduce the number of queries being executed.

[1] This series of articles shows the statistics window as it appears in Winforms applications. The data-provider statistics are also available in Quino web applications as a Glimpse plug-in.

[2]

It is important for users of the Microsoft Entity Framework (EF) to point out that Quino does not have a Linq-to-Sql mapper. That means that any Linq expressions like Where() are evaluated locally instead of being mapped to the database. There are various reasons for this but the main one is that we ended up preferring a strict boundary between the mappable query API and the local evaluation API.

Anything formulated with the query API is guaranteed to be executed by the data provider (even if it must be evaluated locally) and anything formulated with Linq is naturally evaluated locally. In this way, the code is clear in what is sent to the server and what is evaluated locally. Quino only very, very rarely issues an “unmappable query” exception, unlike EF, which occasionally requires contortions until you’ve figured out which C# formulation of a particular expression can be mapped by EF.

[3] Well, the first answer I’m going to pretend I came up with. I actually thought of another answer first, but then quickly discovered that Quino wasn’t mapping that little-used feature correctly. I added an issue to tackle that problem at a later date and started looking for workarounds. That fix will be covered in the next article in this series.

[4] Please note that cursors are disposable and that the calling application is responsible for cleanup. Failure to dispose of a cursor that has been at least partially iterated will result in an open connection in the underlying database providers associated with the query and will eventually lead to connection-pool exhaustion on those databases.

[5] Please ignore the fact that we also dropped 13 person queries. This was not due to any fix that we made but rather that I executed the test slightly differently…and was too lazy to make a new screenshot. The 13 queries are still being executed and we’ll tackle those in the last article in this series.

Optimizing data access for high-latency networks: part I

2014-06-20T10:44:29+02:00

Published by marco on 20. Jun 2014 10:44:29 (GMT-5)

Updated by marco on 24. Jun 2014 13:27:18 (GMT-5)

Punchclock is Encodo’s time-tracking and invoicing tool. It includes a calendar to show time entries (shown to the left). Since the very first versions, it hasn’t opened very quickly. It was fast enough for most users, but those who worked with Punchclock over the WAN through our VPN have reported that it often takes many seconds to open the calendar. So we have a very useful tool that is not often used because of how slowly it opens.

That the calendar opens slowly in a local network and even more slowly in a WAN indicates that there is not only a problem with executing many queries but also with retrieving too much data.

Looking at query statistics

This seemed like a solvable problem, so I fired up Punchclock in debug mode to have a look at the query-statistics window.

To set up the view shown below, I did the following:

Start your Quino application (Punchclock in this case) in debug mode (so that the statistics window is available)
Open the statistics window from the debug menu
Reset the statistics to clear out anything logged during startup
Group the grid by “Meta Class”
Open the calendar to see what kind of queries are generated
Expand the “TimeEntry” group in the grid to show details for individual queries

Time-entry queries are the problem

I marked a few things on the screenshot. It’s somewhat suspicious that there are 13 queries for data of type “Person”, but we’ll get to that later. Much more suspicious is that there are 52 queries for time entries, which seems like quite a lot considering we’re showing a calendar for a single user. We would instead expect to have a single query. More queries would be OK if there were good reasons for them, but I feel comfortable in deciding that 52 queries is definitely too many.

A closer look at the details for the time-entry queries shows very high durations for some of them, ranging from a tenth of a second to nearly a second. These queries are definitely the reason the calendar window takes so long to load.

Why are these queries taking so long?

If I select one of the time-entry queries and show the “Query Text” tab (see screenshot below), I can see that it retrieves all time entries for a single person, one after another. There are almost six years of historical data in our Punchclock database and some of our employees have been around for all of them. [1] That’s a lot of time entries to load.

Query text for all time entries for one person

I can also select the “Stack Trace” tab to see where the call originated in my source code. This feature lets me pinpoint the program component that is causing these slow queries to be executed.

Stack trace for superfluous time-entry queries

As with any UI-code stack, you have to be somewhat familiar with how events are handled and dispatched. In this stack, we can see how a MouseUp command bubbled up to create a new form, then a new control and finally, to trigger a call to the data provider during that control’s initialization. We don’t have line numbers but we see that the call originates in a lambda defined in the DynamicSchedulerControl constructor.

The line of code that I pinpoint as the culprit is shown below.

var people = Session.GetList().Where(p => p.TimeEntries.Any()).ToList();

This looks like a nicely declarative way of getting data, but to the trained eye of a Quino developer, it’s clear what the problem is.

In the next couple of articles, we’ll take a closer look at what exactly the problem is and how we can improve the speed of this query. We’ll also take a look at how we can improve the Quino query API to make it harder for code like the line above to cause performance problems.

[1]

Encodo just turned nine years old, but we used a different time-entry system for the first couple of years. If you’re interested in our time-entry software history, here it is:

06.2005—Start off with Open Office spreadsheets
04.2007—Switch to a home-grown, very lightweight time tracker based on an older framework we’d written (Punchclock 1.0)
08.2008—Start development of Quino
04.2010—Initial version of Punchclock 2.0; start dogfooding Quino

Questions to consider when designing APIs: Part II

2014-06-18T08:10:36+02:00

Published by marco on 18. Jun 2014 08:10:36 (GMT-5)

Updated by marco on 8. Jun 2016 20:51:27 (GMT-5)

In the previous article, we listed a lot of questions that you should continuously ask yourself when you’re writing code. Even when you think you’re not designing anything, you’re actually making decisions that will affect either other team members or future versions of you.

In particular, we’d like to think about how we can reconcile a development process that involves asking so many questions and taking so many facets into consideration with YAGNI.

Designing != Implementing

The implication of this principle is, that if you aren’t going to need something, then there’s no point in even thinking about it. While it’s absolutely commendable to adopt a YAGNI attitude, not building something doesn’t mean not thinking about it and identifying potential pitfalls.

A feature or design concept can be discussed within a time-box. Allocate a fixed, limited amount of time to determine whether the feature or design concept needs to be incorporated, whether it would be nice to incorporate it or possibly to jettison it if it’s too much work and isn’t really necessary.

The overwhelming majority of time wasted on a feature is in the implementation, debugging, testing, documentation and maintenance of it, not in the design. Granted, a long design phase can be a time-sink—especially a “perfect is the enemy of the good” style of design where you’re completely blocked from even starting work. With practice, however, you’ll learn how to think about a feature or design concept (e.g. extensibility) without letting it ruin your schedule.

If you don’t try to anticipate future needs at all while designing your API, you may end up preventing that API from being extended in directions that are both logical and could easily have been anticipated. If the API is not extensible, then it will not be used and may have to be rewritten in the future, losing more time at that point rather than up front. This is, however, only a consideration you must make. It’s perfectly acceptable to decide that you currently don’t care at all and that a feature will have to be rewritten at some point in the future.

You can’t do this kind of cost-benefit analysis and risk-management if you haven’t taken time to identify the costs, benefits or risks.

Document your process

At Encodo, we encourage the person who’s already spent time thinking about this problem to simply document the drawbacks and concessions and possible ideas in an issue-tracker entry that is linked to the current implementation. This allows future users, maintainers or extenders of the API to be aware of the thought process that underlies a feature. It can also help to avoid misunderstandings about what the intended audience and coverage of an API are.

The idea is to eliminate assumptions. A lot of time can be wasted when maintenance developers make incorrect assumptions about the intent of code.

If you don’t have time to do any of this, then you can write a quick note in a task list that you need to more fully document your thoughts on the code you’re writing. And you should try to do that soon, while the ideas are still relatively fresh in your mind. If you don’t have time to think about what you’re doing even to that degree, then you’re doing something wrong and need to get organized better.

That is, you if you can’t think about the code you’re writing and don’t have time to document your process, even minimally, then you shouldn’t be writing that code. Either that, or you implicitly accept that others will have to clean up your mess. And “others” includes future versions of you. (E.g. the you who, six months from now, is muttering, “who wrote this crap?!?”)

Be Honest about Hacking

As an example, we can consider how we go from a specific feature in the context of a project to thinking about where the functionality could fit in to a suite of products—that may or may not yet exist. And remember, we’re only thinking about these things. And we’re thinking about them for a limited time—a time-box. You don’t want to prevent your project from moving forward, but you also don’t want to advance at all costs.

Advancing in an unstructured way is called hacking and, while it can lead to a short-term win, it almost always leads to short-to-medium term deficits. You can still write code that is hacked and looks hacked, if that is the highest current priority, but you’re not allowed to forget that you did so. You must officially designate what you’re doing as a hot-zone of hacking so that the Hazmat team can clean it up later, if needed.

A working prototype that is hacked together just so it works for the next demonstration is great as long as you don’t think that you can take it into production without doing the design and documentation work that you initially skipped.

If you fail to document the deficits that prevent you from taking a prototype to production, then how will you address those deficits? It will cost you much more time and pain to determine the deficits after the fact. Not only that, but unless you do a very good job, it is your users that will most likely be finding deficits—in the form of bugs.

If your product is just a hacked mess of spaghetti code with no rhyme or reason, another developer will be faster and produce more reliable code by just starting over. Trying to determine the flaws, drawbacks and hacks through intuition and reverse-engineering is slower and more error-prone than just starting with a clean slate. Developers on such a project will not be able to save time—and money—by building on what you’ve already made.

A note on error-handling

Not to be forgotten is a structured approach to error-handling. The more “hacked” the code, the more stringent the error-checking should be. If you haven’t had time yet to write or test code sufficiently, then that code shouldn’t be making broad decisions about what it thinks are acceptable errors.

Fail early, fail often. Don’t try to make a hacked mess of code bullet-proof by catching all errors in an undocumented manner. Doing so is deceptive to testers of the product as well as other developers.

If you’re building a demo, make sure the happy path works and stick to it during the demo. If you do have to break this rule, add the hacks to a demo-specific branch of the code that will be discarded later.

Working with a documented project

If, however, the developer can look at your code and sees accompanying notes (either in an issue tracker, as TODOs in the code or some other form of documentation), that developer knows where to start fixing the code to bring it to production quality.

For example, it’s acceptable to configure an application in code as long as you do it in a central place and you document that the intent is to move the configuration to an external source when there’s time. If a future developer finds code for support for multiple database connections and tests that are set to ignore with a note/issue that says “extend to support multiple databases”, that future developer can decide whether to actually implement the feature or whether to just discard it because it has been deprecated as a requirement.

Without documentation or structure or an indication which parts of the code were thought-through and which are considered to be hacked, subsequent developers are forced to make assumptions that may not be accurate. They will either assume that hacked code is OK or that battle-tested code is garbage. If you don’t inform other developers of your intent when your’re writing the code—best done with documentation, tests and/or a cleanly designed API—then it might be discarded or ignored, wasting even more time and money.

If you’re on a really tight time-budget and don’t have time to document your process correctly, then write a quick note that you think the design is OK or the code is OK, but tell your future self or other developers what they’re looking at. It will only take you a few minutes and you’ll be glad you did—and so will they.

Questions to consider when designing APIs: Part I

2014-06-03T10:25:46+02:00

Published by marco on 3. Jun 2014 10:25:46 (GMT-5)

A big part of an agile programmer’s job is API design. In an agile project, the architecture is defined from on high only in broad strokes, leaving the fine details of component design up to the implementer. Even in projects that are specified in much more detail, implementers will still find themselves in situations where they have to design something.

This means that programmers in an agile team have to be capable of weighing the pros and cons of various approaches in order to avoid causing performance, scalability, maintenance or other problems as the API is used and evolves.

When designing an API, we consider some of the following aspects. This is not meant to be a comprehensive list, but should get you thinking about how to think about the code you’re about to write.

Reusing Code

Will this code be re-used inside the project?
How about outside of the project?
If the code might be used elsewhere, where does that need lie on the time axis?
Do other projects already exist that could use this code?
Are there already other implementations that could be used?
If there are implementations, then are they insufficient?
Or perhaps not sufficiently encapsulated for reuse as written?
How likely is it that there will be other projects that need to do the same thing?
If another use is likely, when would the other project or projects need your API?

Organizing Code

Where should the API live in the code?
Is your API local to this class?
Is it private?
Protected?
Are you making it public in an extension method?
Or internal?
Which namespace should it belong to?
Which assembly?

Testing Code

What about testability?
How can the functionality be tested?

Even if you don’t have time to write tests right now, you should still build your code so that it can be tested. It’s possible that you won’t be writing the tests. Instead, you should prepare the code so that others can use it.

It’s also possible that a future you will be writing the tests and will hate you for having made it so hard to automate testing.

Managing Dependencies

Is multi-threading a consideration?
Does the API manage state?
What kind of dependencies does the API have?
Which dependencies does it really need?
Is the API perhaps composed of several aspects?
With a core aspect that is extended by others?
Can core functionality be extracted to avoid making an API that is too specific?

Documenting Code

How do callers use the API?
What are the expected values?
Are these expectations enforced?
What is the error mechanism?
What guarantees does the API make?
Is the behavior of the API enforced?
Is it at least documented?
Are known drawbacks documented?

Error-handling

This is a very important one and involves how your application handles situations outside of the design.

If you handle externally provided data, then you have to handle extant cases
Are you going to log errors?
In which format?
Is there a standard logging mechanism?
How are you going to handle and fix persistent errors?
Are you even going to handle weird cases?
Or are you going to fail early and fail often?
For which errors should your code even responsible?
How does your chosen philosophy (and you should be enforcing contracts) fit with the other code in the project?

Fail fast; enforce contracts

While we’re on the subject of error-handling, I want to emphasize that this is one of the most important parts of API design, regardless of which language or environment you use. [1]

Add preconditions for all method parameters; verify them as non-null and verify ranges. Do not catch all exceptions and log them or—even worse—ignore them. This is even more important in environments—I’m looking at you client-side web code in general and JavaScript in particular—where the established philosophy is to run anything and to never rap a programmer on the knuckles for having written really knuckle-headed code.

You haven’t tested the code, so you don’t know what kind of errors you’re going to get. If you ignore everything, then you’ll also ignore assertions, contract violations, null-reference exceptions and so on. The code will never be improved if it never makes a noise. It will just stay silently crappy until someone notices a subtle logical error somewhere and must painstakingly track it down to your untested code.

You might say that production code shouldn’t throw exceptions. This is true, but we’re explicitly not talking about production code here. We’re talking about code that has few to no tests and is acknowledged to be incomplete. If you move code like this into production, then it’s better to crash than to silently corrupt data or impinge the user experience.

A crash will get attention and the code may even be fixed or improved. If you write code that will crash on all but the “happy path” and it never crashes? That’s great. Do not program preemptively defensively in fresh code. If you have established code that interfaces with other (possibly external) components and you sometimes get errors that you can’t work around in any other way, then it’s OK to catch and log those exceptions rather than propagating them. At least you tried.

In the next article, we’ll take a look at how all of these questions and considerations can at all be reconciled with YAGNI. Spoiler alert: we think that they can.

[1] I recently read Erlang and code style by Jesper L. Andersen (Medium), which seems to have less to do with programming Erlang and much more to do with programming properly. The advice contained in it seems to be only for Erlang programmers, but the idea of strictly enforcing APIs between software components is neither new nor language-specific.

Dealing with improper disposal in WCF clients

2014-05-31T08:55:13+02:00

Published by marco on 31. May 2014 08:55:13 (GMT-5)

There’s an old problem in generated WCF clients in which the Dispose() method calls Close() on the client irrespective of whether there was a fault. If there was a fault, then the method should call Abort() instead. Failure to do so causes another exception, which masks the original exception. Client code will see the subsequent fault rather than the original one. A developer running the code in debug mode will have be misled as to what really happened.

You can see WCF Clients and the “Broken” IDisposable Implementation by David Barrett for a more in-depth analysis, but that’s the gist of it.

This issue is still present in the ClientBase implementation in .NET 4.5.1. The linked article shows how you can add your own implementation of the Dispose() method in each generated client. An alternative is to use a generic adaptor if you don’t feel like adding a custom dispose to every client you create. [1]

public class SafeClient : IDisposable
  where T : ICommunicationObject, IDisposable
{
  public SafeClient(T client)
  {
    if (client == null) { throw new ArgumentNullException("client"); }

    Client = client;
  }
  
  public T Client { get; private set; }

  public void Dispose()
  {
    Dispose(true);
    GC.SuppressFinalize(this);
  }

  protected virtual void Dispose(bool disposing)
  {
    if (disposing)
    {
      if (Client != null)
      {
        if (Client.State == CommunicationState.Faulted) 
        {
          Client.Abort();
        }
        else
        {
          Client.Close();
        }

        Client = default(T);
      }
    }
  }  
}

To use your WCF client safely, you wrap it in the class defined above, as shown below.

using (var safeClient = new SafeClient(new SystemLoginServiceClient(…)))
{
  var client = safeClient.Client;
  // Work with "client"
}

If you can figure out how to initialize your clients without passing parameters to the constructor, you could slim it down by adding a “new” generic constraint to the parameter T in SafeClient and then using the SafeClient as follows:

using (var safeClient = new SafeClient())
{
  var client = safeClient.Client;
  // Work with "client"
}

[1] The code included in this article is a sketch of a solution and has not been tested. It does compile, though.

REST API Status codes (400 vs. 500)

2014-05-31T08:55:09+02:00

Published by marco on 31. May 2014 08:55:09 (GMT-5)

In a project that we’re working on, we’re consuming REST APIs delivered by services built by another team working for the same customer. We had a discussion about what were appropriate error codes to return for various situations. The discussion boiled down to: should a service return a 500 error code or a 400 error code when a request cannot be processed?

I took a quick look at the documentation for a couple of the larger REST API providers and they are using the 500 code only for catastrophic failure and using the 400 code for anything related to query-input validation errors.

Microsoft Azure Common REST API Error Codes

Code 400:

The requested URI does not represent any resource on the server.

One of the request inputs is out of range.

One of the request inputs is not valid.

A required query parameter was not specified for this request.

One of the query parameters specified in the request URI is not supported.

An invalid value was specified for one of the query parameters in the request URI.

Code 500:

The server encountered an internal error. Please retry the request.

The operation could not be completed within the permitted time.

The server is currently unable to receive requests. Please retry your request.

Twitter Error Codes & Responses

Code 400:

“The request was invalid or cannot be otherwise served. An accompanying error message will explain further.”

Code 500:

“Something is broken. Please post to the group so the Twitter team can investigate.”

REST API Tutorial HTTP Status Codes

Code 400:

“General error when fulfilling the request would cause an invalid state. Domain validation errors, missing data, etc. are some examples.”

Code 500:

“A generic error message, given when no more specific message is suitable. The general catch-all error when the server-side throws an exception. Use this only for errors that the consumer cannot address from their end—never return this intentionally.”

REST HTTP status codes

“For input validation failure: 400 Bad Request + your optional description. This is suggested in the book “RESTful Web Services”.”

Mixing your own SQL into Quino queries: part 2 of 2

2014-04-17T21:30:02+02:00

Published by marco on 17. Apr 2014 21:30:02 (GMT-5)

In the first installment, we covered the basics of mixing custom SQL with ORM-generated queries. We also took a look at a solution that uses direct ADO database access to perform arbitrarily complex queries.

In this installment, we will see more elegant techniques that make use of the CustomCommandText property of Quino queries. We’ll approach the desired solution in steps, proceeding from attempt #1 – attempt #5.

tl;dr: Skip to attempt #5 to see the final result without learning why it’s correct.

Attempt #1: Replacing the entire query with custom SQL

An application can assign the CustomCommandText property of any Quino query to override some of the generated SQL. In the example below, we override all of the text, so that Quino doesn’t generate any SQL at all. Instead, Quino is only responsible for sending the request to the database and materializing the objects based on the results.

[Test]
public void TestExecuteCustomCommand()
{
  var people = Session.GetList();

  people.Query.CustomCommandText = new CustomCommandText
  {
    Text = @"
SELECT ALL 
""punchclock__person"".""id"", 
""punchclock__person"".""companyid"", 
""punchclock__person"".""contactid"", 
""punchclock__person"".""customerid"", 
""punchclock__person"".""initials"", 
""punchclock__person"".""firstname"", 
""punchclock__person"".""lastname"", 
""punchclock__person"".""genderid"", 
""punchclock__person"".""telephone"", 
""punchclock__person"".""active"", 
""punchclock__person"".""isemployee"", 
""punchclock__person"".""birthdate"", 
""punchclock__person"".""salary"" 
FROM punchclock__person WHERE lastname = 'Rogers'"
  };

  Assert.That(people.Count, Is.EqualTo(9));
}

This example solves two of the three problems outlined above:

It uses only a single query.
It will work with a remote application server (although it makes assumptions about the kind of SQL expected by the backing database on that server).
But it is even more fragile than the previous example as far as hard-coded SQL goes. You’ll note that the fields expected by the object-materializer have to be explicitly included in the correct order.

Let’s see if we can address the third issue by getting Quino to format the SELECT clause for us.

Attempt #2: Generating the `SELECT` clause

The following example uses the AccessToolkit of the IQueryableDatabase to format the list of properties obtained from the metadata for a Person. The application no longer makes assumptions about which properties are included in the select statement, what order they should be in or how to format them for the SQL expected by the database.

[Test]
public virtual void TestExecuteCustomCommandWithStandardSelect()
{
  var people = Session.GetList();

  var accessToolkit = DefaultDatabase.AccessToolkit;
  var properties = Person.Metadata.DefaultLoadGroup.Properties;
  var fields = properties.Select(accessToolkit.GetField);

  people.Query.CustomCommandText = new CustomCommandText
  {
    Text = string.Format(
      @"SELECT ALL {0} FROM punchclock__person WHERE lastname = 'Rogers'",
      fields.FlattenToString()
    )
  };

  Assert.That(people.Count, Is.EqualTo(9));
}

This example fixes the problem with the previous one but introduces a new problem: it no longer works with a remote application because it assumes that the client-side driver is a database with an AccessToolkit. The next example addresses this problem.

Attempt #3: Using a hard-coded `AccessToolkit`

The version below uses a hard-coded AccessToolkit so that it doesn’t rely on the external data driver being a direct ADO database. It still makes an assumption about the database on the server but that is usually quite acceptable because the backing database for most applications rarely changes. [1]

[Test]
public void TestCustomCommandWithPostgreSqlSelect()
{
  var people = Session.GetList();

  var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;
  var properties = Person.Metadata.DefaultLoadGroup.Properties;
  var fields = properties.Select(accessToolkit.GetField);

  people.Query.CustomCommandText = new CustomCommandText
  {
    Text = string.Format(
      @"SELECT ALL {0} FROM punchclock__person WHERE lastname = 'Rogers'",
      fields.FlattenToString()
    )
  };

  Assert.That(people.Count, Is.EqualTo(9));
}

We now have a version that satisfies all three conditions to a large degree. The application uses only a single query and the query works with both local databases and remoting servers. It still makes some assumptions about database-schema names (e.g. “punchclock__person” and “lastname”). Let’s see if we can clean up some of these as well.

Attempt #4: Replacing only the `where` clause

Instead of replacing the entire query text, an application can replace individual sections of the query, letting Quino fill in the rest of the query with its standard generated SQL. An application can append or prepend text to the generated SQL or replace it entirely. Because the condition for our query is so simple, the example below replaces the entire WHERE clause instead of adding to it.

[Test]
public void TestCustomWhereExecution()
{
  var people = Session.GetList();

  people.Query.CustomCommandText = new CustomCommandText();
  people.Query.CustomCommandText.SetSection(
    CommandTextSections.Where, 
    CommandTextAction.Replace, 
    "lastname = 'Rogers'"
  );

  Assert.That(people.Count, Is.EqualTo(9));
}

That’s much nicer—still not perfect, but nice. The only remaining quibble is that the identifier lastname is still hard-coded. If the model changes in a way where that property is renamed or removed, this code will continue to compile but will fail at run-time. This is a not insignificant problem if your application ends up using these kinds of queries throughout its business logic.

Attempt #5: Replacing the `where` clause with generated field names

In order to fix this query and have a completely generic query that fails to compile should anything at all change in the model, we can mix in the technique that we used in attempts #2 and #3: using the AccessToolkit to format fields for SQL. To make the query 100% statically checked, we’ll also use the generated metadata—LastName—to indicate which property we want to format as SQL.

[Test]
public void TestCustomWhereExecution()
{
  var people = Session.GetList();

  var accessToolkit = new PostgreSqlMetaDatabase().AccessToolkit;
  var lastNameField = accessToolkit.GetField(Person.MetaProperties.LastName);

  people.Query.CustomCommandText = new CustomCommandText();
  people.Query.CustomCommandText.SetSection(
    CommandTextSections.Where, 
    CommandTextAction.Replace, 
    string.Format("{0} = 'Rogers'", lastNameField)
  );

  Assert.That(people.Count, Is.EqualTo(9));
}

The query above satisfies all of the conditions we outlined above. it’s clear that the condition is quite simple and that real-world business logic will likely be much more complex. For those situations, the best approach is to fall back to using the direct ADO approach mixed with using Quino facilities like the AccessToolkit as much as possible to create a fully customized SQL text.

Many thanks to Urs for proofreading and suggestions on overall structure.

[1] If an application needs to be totally database-agnostic, then it will need to do some extra legwork that we won’t cover in this post.

Mixing your own SQL into Quino queries: part 1 of 2

2014-04-13T17:38:59+02:00

Published by marco on 13. Apr 2014 17:38:59 (GMT-5)

The Quino ORM [1] manages all CrUD—Create, Update, Delete—operations for your application. This basic behavior is generally more than enough for standard user interfaces. When a user works with a single object in a window and saves it, there really isn’t that much to optimize.

Modeled methods

A more complex editing process may include several objects at once and perhaps trigger events that create additional auditing objects. Even in these cases, there are still only a handful of save operations to execute. To keep the architecture clean, an application is encouraged to model these higher-level operations with methods in the metadata (modeled methods).

The advantage to using modeled methods is that they can be executed in an application server as well as locally in the client. When an application uses a remote application server rather than a direct connection to a database, modeled methods are executed in the service layer and therefore have much less latency to the database.

When Quino’s query language isn’t enough

If an application needs even more optimization, then it may be necessary to write custom SQL—or even to use stored procedures to move the query into the database. Mixing SQL with an ORM can be a tricky business. It’s even more of a challenge with an ORM like that in Quino, which generates the database schema and shields the user from tables, fields and SQL syntax almost entirely.

What are the potential pitfalls when using custom query text (e.g. SQL) with Quino?

Schema element names: An application needs to figure out the names of database objects like table and columns. It would be best not to hard-code them so that when the model changes, the custom code will be automatically updated.
- If the query is in a stored procedure, then the database may ensure that the code is updated or at least checked when the schema changes. [2]
- If the query is in application code, then care can be taken to keep that query in-sync with the model
Materialization: In particular, the selected fields in a projection must match the expectations of the ORM exactly so that it can materialize the objects properly. We’ll see how to ensure this in examples below.

There are two approaches to executing custom code:

ADO: Get a reference to the underlying ADO infrastructure to execute queries directly without using Quino at all. With this approach, Quino can still help an application retrieve properly configured connections and commands.
CustomCommandText: An application commonly adds restrictions and sorts to the IQuery object using expressions, but can also add text directly to enhance or replace sections of the generated query.

All of the examples below are taken directly from the Quino test suite. Some variables—like DefaultDatabase—are provided by the Quino base testing classes but their purpose, types and implementation should be relatively obvious.

Using ADO directly

You can use the AdoDataConnectionTools to get the underlying ADO connection for a given Session so that any commands you execute are guaranteed to be executed in the same transactions as are already active on that session. If you use these tools, your ADO code will also automatically use the same connection parameters as the rest of your application without having to use hard-coded connection strings.

The first example shows a test from the Quino framework that shows how easy it is to combine results returned from another method into a standard Quino query.

[Test]
public virtual void TestExecuteAdoDirectly()
{
  var ids = GetIds().ToList();
  var people = Session.GetList();

  people.Query.Where(Person.MetaProperties.Id, ExpressionOperator.In, ids);

  Assert.That(people.Count, Is.EqualTo(9));
}

The ADO-access code is hidden inside the call to GetIds(), the implementation for which is shown below. Your application can get the connection for a session as described above and then create commands using the same helper class. If you call CreateCommand() directly on the ADO connection, you’ll have a problem when running inside a transaction on SQL Server. The SQL Server ADO implementation requires that you assign the active transaction object to each command. Quino takes care of this bookkeeping for you if you use the helper method.

private IEnumerable GetIds()
{
  using (var helper = AdoDataConnectionTools.GetAdoConnection(Session, "Name"))
  {
    using (var command = helper.CreateCommand())
    {
      command.AdoCommand.CommandText = 
        @"SELECT id FROM punchclock__person WHERE lastname = 'Rogers'";

      using (var reader = command.AdoCommand.ExecuteReader())
      {
        while (reader.Read())
        {
          yield return reader.GetInt32(0);
        }
      }
    }
  }
}

There are a few drawbacks to this approach:

Your application will make two queries instead of one.
The hard-coded SQL will break if you make model changes that affect those tables and fields.
The ADO approach only works if the application has a direct connection to the database. An application that uses ADO will not be able to switch to an application-server driver without modification.

In the second part, we will improve on this approach by using the CustomCommandText property of a Quino query. This will allow us to use only a single query. We will also improve maintainability by reducing the amount of code that isn’t checked by the compiler (e.g. the SQL text above).

Stay tuned for part 2, coming soon!

Many thanks to Urs for proofreading and suggestions on overall structure.

[1] This article uses features of Quino that will only become available in version 1.12. Almost all of the examples will also work in earlier versions but the AdoDataConnectionTools is not available until 1.12. The functionality of this class can, however, be back-ported if necessary.

[2] More likely, though, is that the Quino schema migration will be prevented from applying updates if there are custom stored procedures that use tables and columns that need to be changed.

Mixing your own SQL into Quino queries: part 1 of 2

Java 8

2014-03-28T15:53:54+01:00

Published by marco on 28. Mar 2014 15:53:54 (GMT-5)

Updated by marco on 28. Mar 2014 15:56:09 (GMT-5)

This article discusses and compares the initial version of Java 8 and C# 4.5.1. I have not used Java 8 and I have not tested that any of the examples—Java or C#—even compile, but they should be pretty close to valid.

Java 8 has finally been released and—drum roll, please—it has closures/lambdas, as promised! I would be greeting this as champagne-cork–popping news if I were still a Java programmer. [1] As an ex-Java developer, I greet this news more with an ambivalent shrug than with any overarching joy. It’s a sunny morning and I’m in a good mood, so I’m able to suppress what would be a more than appropriate comment: “it’s about time”.

Since I’m a C# programmer, I’m more interested in peering over the fence at the pile of goodies that Java just received for its eighth birthday and see if it got something “what I ain’t got”. I found a concise list of new features in the article Will Java 8 Kill Scala? by Ahmed Soliman and was distraught/pleased [2] to discover that Java had in fact gotten two presents that C# doesn’t already have.

As you’ll see, these two features aren’t huge and the lack of them doesn’t significantly impact design or expressiveness, but you know how jealousy works:

Jealousy doesn’t care.

Jealousy is.

I’m sure I’ll get over it, but it will take time. [3]

Default methods and static interface methods

Java 8 introduces support for static methods on interfaces as well as default methods that, taken together, amount to functionality that is more or less what extensions methods brings to C#.

In Java 8, you can define static methods on an interface, which is nice, but it becomes especially useful when combined with the keyword default on those methods. As defined in Default Methods (Java Tutorials):

“Default methods enable you to add new functionality to the interfaces of your libraries and ensure binary compatibility with code written for older versions of those interfaces.”

In Java, you no longer have to worry that adding a method to an interface will break implementations of that interface in other jar files that have not yet been recompiled against the new version of the interface. You can avoid that by adding a default implementation for your method. This applies only to those methods where a default implementation is possible, of course.

The page includes an example but it’s relatively obvious what it looks like:

public interface ITransformer
{
  string Adjust(string value);
  string NewAdjust(string value)
  {
    return value.Replace(' ', '\t');
  }
}

How do these compare with extension methods in C#?

Extension methods are nice because they allow you to quasi-add methods to an interface without requiring an implementor to actually implement them. My rule of thumb is that any method that can be defined purely in terms of the public API of an interface should be defined as an extension method rather than added to the interface.

Java’s default methods are a twist on this concept that addresses a limitation of extension methods. What is that limitation? That the method definition in the extension method can’t be overridden by the actual implementation behind the interface. That is, the default implementation can be expressed purely in terms of the public interface, but perhaps a specific implementor of the interface would like to do that plus something more. Or would perhaps like to execute the extension method in a different way, but only for a specific implementation. There is no way to do this with extension methods.

Interface default methods in Java 8 allow you to provide a fallback implementation but also allows any class to actually implement that method and override the fallback.

Functional Interfaces

Functional interfaces are a nice addition, too, and something I’ve wanted in C# for some time. Eric Meijer of Microsoft doesn’t miss an opportunity to point out that this is a must for functional languages (he’s exaggerating, but the point is taken).

Saying that a language supports functional interface simply means that a lambda defined in that language can be assigned to any interface with a single method that has the same signature as that lambda.

An example in C# should make things clearer:

public interface ITransformer
{
  string Adjust(string value);
}

public static class Utility
{
  public static void WorkOnText(string text, ITransformer)
  {
    // Do work
  }
}

In order to call WorkOnText() in C#, I am required to define a class that implements ITransformer. There is no other way around it. However, in a language that allows functional interfaces, I could call the method with a lambda directly. The following code looks like C# but won’t actually compile.

Utility.WorkOnText(
  "Hello world",
  s => s.Replace("Hello", "Goodbye cruel")
);

For completeness, let’s also see how much extra code it is do this in C#, which has no functional interfaces.

public class PessimisticTransformer : ITransformer
{
  public string Adjust(string value)
  {
    return value.Replace("Hello", "Goodbye cruel");
  }
}

Utility.WorkOnText(
  "Hello world",
  new PessimisticTransformer()
);

That’s quite a huge difference. It’s surprising that C# hasn’t gotten this functionality yet. It’s hard to see what the downside is for this feature—it doesn’t seem to alter semantics.

While it is supported in Java, there are other restrictions. The signature has to match exactly. What happens if we add an optional parameter to the interface-method definition?

public interface ITransformer
{
  string Adjust(string value, ITransformer additional = null);
}

In the C# example, the class implementing the interface would have to be updated, of course, but the code at calling location remains unchanged. The functional interface’s definition is the calling location, so the change would be closer to the implementation instead of more abstracted from it.

public class PessimisticTransformer : ITransformer
{
  public string Adjust(string value, ITransformer additional = null)
  {
    return value.Replace("Hello", "Goodbye cruel");
  }
}

// Using a class
Utility.WorkOnText(
  "Hello world",
  new PessimisticTransformer()
);

// Using a functional interface
Utility.WorkOnText(
  "Hello world",
  (s, a) => s.Replace("Hello", "Goodbye cruel")
);

I would take the functional interface any day.

Java Closures

As a final note, Java 8 has finally acquired closures/lambdas [4] but there is a limitation on which functions can be passed as lambdas. It turns out that the inclusion of functional interfaces is a workaround for not having first-class functions in the language.

Citing the article,

“[…] you cannot pass any function as first-class to other functions, the function must be explicitly defined as lambda or using Functional Interfaces”

While in C# you can assign any method with a matching signature to a lambda variable or parameter, Java requires that the method be first assigned to a variable that is “explicitly assigned as lambda” in order to use. This isn’t a limitation on expressiveness but may lead to clutter.

In C# I can write the following:

public string Twist(string value)
{ 
  return value.Reverse();
}

public string Alter(this string value, Func<string, string> func)
{
  return func(value);
}

public string ApplyTransformations(string value)
{
  return value.Alter(Twist).Alter(s => s.Reverse());
}

This example shows how you can declare a Func to indicate that the parameter is a first-class function. I can pass the Twist function or I can pass an inline lambda, as shown in ApplyTransformations. However, in Java, I can’t declare a Func: only functional interfaces. In order to replicate the C# example above in Java, I would do the following:

public String twist(String value)
{ 
  return new StringBuilder(value).reverse().toString();
}

public String alter(String value, ITransformer transformer)
{
  return transformer.adjust(value);
}

public String applyTransformations(String value)
{
  return alter(alter(value, s -> twist(s)), s -> StringBuilder(s).reverse().toString();
}

Note that the Java example cannot pass Twist directly; instead, it wraps it in a lambda so that it can be passed as a functional interface. Also, the C# example uses an extension method, which allows me to “add” methods to class string, which is not really possible in Java.

Overall, though, while these things feel like deal-breakers to a programming-language snob [5]—especially those who have a choice as to which language to use—Java developers can rejoice that their language has finally acquired features that both increase expressiveness and reduce clutter. [6]

As a bonus, as a C# developer, I find that I don’t have to be so jealous after all.

Though I’d still really like me some functional interfaces.

[1] Even if I were still a Java programmer, the champagne might still stay in the bottle because adoption of the latest runtime in the Java world is extremely slow-paced. Many projects and products require a specific, older version of the JVM and preclude updating to take advantage of newer features. The .NET world naturally has similar limitations but the problem seems to be less extreme.

[2] Distraught because the features look quite interesting and useful and C# doesn’t have them and pleased because (A) I am not so immature that I can’t be happy for others and (B) I know that innovation in other languages is an important driver in your own language.

[3] Totally kidding here. I’m not insane. Take my self-diagnosis with a grain of salt.

[4] I know that lambdas and closures are not by definition the same and I’m not supposed to use the interchangeably. I’m trying to make sure that a C# developer who reads this article doesn’t read “closure” (which is technically what a lambda in C# is because it’s capable of “closing over” or capturing variables) and not understand that it means “lambda”.

[5] Like yours truly.

[6] Even if most of those developers won’t be able to use those features for quite some time because they work on projects or products that are reluctant to upgrade.

Quino: efficiency, hinting and local sorting

2014-03-13T21:46:59+01:00

Published by marco on 13. Mar 2014 21:46:59 (GMT-5)

In Quino: partially-mapped queries we took a look at how Quino seamlessly maps as much as possible to the database, while handling unmappable query components locally as efficiently as possible.

Correctness is more important than efficiency

As efficiently as possible can be a bit of a weasel statement. We saw that partial application of restrictions could significantly reduce the data returned. And we saw that efficient handling of that returned data could minimize the impact on both performance and memory, keeping in mind, of course, that the primary goal is correctness.

However, as we saw in the previous article, it’s still entirely possible that even an optimally mapped query will result in an unacceptable memory-usage or performance penalty. In these cases, we need to be able to hint or warn the developer that something non-optimal is occurring. It would also be nice if the developer could indicate whether or not queries with such deficiencies should even be executed.

When do things slow down?

Why would this be necessary? Doesn’t the developer have ultimate control over which queries are called? The developer has control over queries in business-logic code. But recall that the queries that we are using are somewhat contrived in order to keep things simple. Quino is a highly generic metadata framework: most of the queries are constructed by standard components from expressions defined in the metadata.

For example, the UI may piece together a query from various sources in order to retrieve the data for a particular view. In such cases, the developer has less direct control to “repair” queries with hand-tuning. Instead, the developer has to view the application holistically and make repairs in the metadata. This is one of many reasons why Quino has local evaluation and does not simply throw an exception for partially mapped queries, as EF does.

Debugging data queries

Data-provider statistics windowIt is, in general, far better to continue working while executing a possibly sub-optimal and performance-damaging query than it is to simply crash out. Such behavior would increase the testing requirements for generated UIs considerably. Instead, the UI always works and the developer can focus on optimization and fine-tuning in the model, using tools like the Statistics Viewer, shown to the left.

Data statistics in Glimpse (preview)The statistics viewer shows all commands executed in an application, with a stack trace, messages (hints/warnings/info) and the original query and mapped SQL/remote statement for each command. The statistics are available for SQL-based data drivers, but also for remoting drivers for all payload types (including JSON).

The screenshot above is for the statistics viewer for Winform applications; we’ve also integrated statistics into web applications using Glimpse, a plugin architecture for displaying extra information for web-site developers. The screenshot to the right shows a preview-release version that will be released with Quino 1.11 at the end of March.

Sorting is all or nothing

One place where an application can run into efficiency problems is when the sort order for entities is too complex to map to the server.

If a single restriction cannot be mapped to the database, we can map all of the others and evaluate the unmappable ones locally. What happens if a single sort cannot be mapped to the database? Can we do the same thing? Again, to avoid being too abstract, let’s start with an example.

var query = Session.GetQuery();
query
  .Where(Person.Fields.LastName, ExpressionOperator.StartsWith[1], "M")
  .OrderBy(Person.Fields.LastName)
  .OrderBy(Person.Fields.FirstName)
  .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetList(query).Count, Is.Between(100, 120));

Both of these sorts can be mapped to the server so the performance and memory hit is very limited. The ORM will execute a single query and will return data for and create about 100 objects.

Now, let’s replace one of the mappable sorts with something unmappable:

var query = Session.GetQuery();
query
  .Where(Person.Fields.LastName, ExpressionOperator.StartsWith[1], "M")
  .OrderBy(new DelegateExpression(c => c.GetObject().FirstName)
  .OrderBy(Person.Fields.LastName)
  .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetList(query).Count, Is.Between(100, 120));

What’s happening here? Instead of being able to map both sorts to the database, now only one can be mapped. Or can it? The primary sort can’t be mapped, so there’s obviously no point in mapping the secondary sort. Instead, all sorting must be applied locally.

What if we had been able to map the primary sort but not the secondary one? Then we could have the database apply the primary sort, returning the data partially ordered. We can apply the remaining sort in memory…but that won’t work, will it? If we only applied the secondary sort in memory, then the data would end up sort only by that value. It turns out that, unlike restrictions, sorting is all-or-nothing. If we can’t map all sorts to the database, then we have to apply them all locally. [1]

In this case, the damage is minimal because the restrictions can be mapped and guarantee that only about 100 objects are returned. Sorting 100 objects locally isn’t likely to show up on the performance radar.

Still, sorting is a potential performance-killer: as soon as you stray from the path of standard sorting, you run the risk of either:

Choosing a sort that is mappable but not covered by an index on the database
Choosing a sort that is unmappable and losing out on index-optimized sorting on the database

In the next article, we’ll discuss how we can extract slices from a result set—using limit and offset—and what sort of effect this can have on performance in partially mapped queries.

[1] The mapper also doesn’t bother adding any ordering to the generated query if at least one ordering is unmappable. There’s no point in wasting time on the database with a sort that will be re-applied locally.

Quino: partially-mapped queries

2014-03-06T22:33:32+01:00

Published by marco on 6. Mar 2014 22:33:32 (GMT-5)

In Quino: an overview of query-mapping in the data driver we took a look at some of the basics of querying data with Quino while maintaining acceptable performance and memory usage.

Now we’ll take a look at what happens with partially-mapped queries. Before explaining what those are, we need a more concrete example to work with. Here’s the most-optimized query we ended up with in the previous article:

var query = Session.GetQuery();
query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(query), Is.GreaterThanEqual(140000));

With so many entries, we’ll want to trim down the list a bit more before we actually create objects. Let’s choose only people whose last names start with the letter “M”.

var query = Session.GetQuery();
query
  .Where(Person.Fields.LastName, ExpressionOperator.StartsWith [1], "M")
  .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(query), Is.Between(100, 120));

This is the kind of stuff that works just fine in other ORMs, like Entity Framework. Where Quino goes just a little farther is in being more forgiving when a query can be only partially mapped to the server. If you’ve used EF for anything beyond trivial queries, you’ve surely run into an exception that tells you that some portion of your query could not be mapped. [2]

Instead of throwing an exception, Quino sends what it can to the database and uses LINQ to post-process the data sent back by the database to complete the query.

Introducing unmappable expressions

Unmappable code can easily sneak in through aspects in the metadata that define filters or sorts using local methods or delegates that do not exist on the server. Instead of building a complex case, we’re going to knowingly include an unmappable expression in the query.

var query = Session.GetQuery();
query
  .Where(new DelegateExpression [3](c => c.GetObject().LastName.StartsWith("M") [4])
  .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(query), Is.Between(100, 120));

The new expression performs the same check as the previous example, but in a way that cannot be mapped to SQL. [5] With our new example, we’ve provoked a situation where any of the following could happen:

The ORM could throw up its hands and declare the query unmappable, pushing the responsibility for separating mappable from unmappable onto the shoulders of the developers. As noted above, this is what EF does.
The ORM could determine that the query is unmappable and evaluate everything locally, retrieving only the initial set of Person objects from the server (all several million of them, if you’ll recall from the previous post).
The ORM could map part of the query to the database, retrieving the minimal set of objects necessary in order to guarantee the correct result. This is what Quino does. The strategy works in many cases, but is not without its pitfalls.

What happens when we evaluate the query above? With partial mapping, we know that the restriction to “IBM” will be applied on the database. But we still have an additional restriction that must be applied locally. Instead of being able to get the count from the server without creating any objects, we’re now forced to create objects in memory so that we can apply the local restrictions and only count the objects that match them all.

But as you’ll recall from the previous article, the number of matches for “IBM” is 140,000 objects. The garbage collector just gave you a dirty look again.

Memory bubbles

There is no way to further optimized this query because of the local evaluation, but there is a way to avoid another particularly nasty issue: memory bubbles.

What is a memory bubble you might ask? It describes what happens when your application is using nMB and then is suddenly using n + 100MB because you created 140,000 objects all at once. Milliseconds later, the garbage collector is thrashing furiously to clean up all of these objects—and all because you created them only in order to filter and count them. A few milliseconds after that, your application is back at nMB but the garbage collector’s fur is ruffled and it’s still trembling slightly from the shock.

The way to avoid this is to stream the objects through your analyzer one at a time rather than to create them all at once. Quino uses lazily-evaluated IEnumerable sequences throughout the data driver specifically to prevent memory bubbles.

Streaming with `IEnumerable` sequences

Before tackling how the Quino ORM handles the Count(), let’s look at how it would return the actual objects from this query.

Map the query to create a SELECT statement
At this point, it doesn’t matter whether the entire query could be mapped
Create an IEnumerable sequence that represents the result of the mapped query
At this point, nothing has been executed and no objects have been returned
Wrap the sequence in another sequence that applies all of the “unhandled” parts of the query
Return that sequence as the result of executing the query
At this point, we still haven’t actually executed anything on the database or created any objects

Right, now we have an IEnumerable that represents the result set, but we haven’t lit the fuse on it yet.

How do we light the fuse? Well, the most common way to do so is to call ToList() on it. What happens then?

The IEnumerator requests an element
The query is executed against the database and returns an IDataReader
The reader requests a row and creates a Person object from that row’s data
The wrapper that performs the local evaluation applies its filter(s) to this Person and yields it if it matches
If it matched the local filters, the Person is added to the list
Control returns to the IDataReader, which requests another row
Repeat until no more rows are returned from the database

Since the decision to add all objects to a list occurs all the way at the very outer caller, it’s the caller that’s at fault for the memory bubble not the driver. [6] We’ll see in the section how to avoid creating a list when none is needed.

Using cursors to control evaluation

If we wanted to process data but perhaps offer the user a chance to abort processing at any time, we could even get an IDataCursor from the Quino ORM so control iteration ourselves.

using (var cursor = Session.CreateCursor(query))
{
  foreach (var obj in cursor)
  {
    // Do something with obj

    if (userAbortedOperation) { break; }
  }
}

And finally, the count query

But back to evaluating the query above. The Quino ORM handles it like this:

Try to map the query to create a COUNT statement
Notice that at least one restriction could not be mapped
Create a cursor to SELECT all of the objects for the same query (shown above) and count all the objects returned by that instead

So, if a count-query cannot be fully mapped to the database, the most efficient possible alternative is to execute a query that retrieves as few objects as possible (i.e. maps as much to the server as it can) and streams those objects to count them locally.

Tune in next time for a look at how to exert more control with limit and offset and how those work together with partial mapping.

[1] These exceptions are not always obvious, either. Often, the exception complains about not being able to call a static method on a non-static object or class, which occurs when mapping the Expression object to SQL. EF’s error messages have in general gotten much better but the mapping failures can be quite confusing.

[2] If we were worried that the last names in our database might not necessarily be capitalized, we would use the ExpressionOperator.StartsWithCI to perform the check in a case-insensitive manner instead.

[3] A DelegateExpression simply wraps the lambda given in the constructor in a Quino expression object. The parameter c is an IExpressionContext that provides the target object, which is in this case a Person.

[4] I’m going to go ahead and assume that my database schema doesn’t allow nulls for the LastName field.

[5] If Quino had a LINQ-to-SQL provider, there’s a chance that more of these delegates could be mapped, but we don’t have one…and they can’t.

[6] Did we still create 140,000 objects? Yes we did, but not all at once. Now, there are probably situations where it is better to create several objects rather than streaming them individually, but I’m confident that keeping this as the default is the right choice. If you find that your particular situation warrants different behavior, feel free to use Session.CreateCursor() to control evaluation yourself and create the right-sized batches of objects to count. The ChangeAndSave() extension method does exactly that to load objects in batches (size adjustable by an optional parameter) rather than one by one.

LESS vs. SASS: Variable semantics

2014-02-24T23:01:09+01:00

Published by marco on 24. Feb 2014 23:01:09 (GMT-5)

Updated by marco on 24. Feb 2014 23:13:04 (GMT-5)

I’ve been using CSS since pretty much its inception. It’s powerful but quite low-level and lacks support for DRY. So, I switched to generating CSS with LESS a while back. This has gone quite well and I’ve been pretty happy with it.

Recently, I was converting some older, theme stylesheets for earthli. A theme stylesheet provides no structural CSS, mostly setting text, background and border colors to let users choose the basic color set. This is a perfect candidate for LESS.

So I constructed a common stylesheet that referenced LESS variables that I would define in the theme stylesheet. Very basically, it looks like this:

crimson.less

@body_color: #800;
@import "theme-base";

theme-base.less

body
{
  background-color: @body_color;
}

This is just about the most basic use of LESS that even an amateur user could possibly imagine. I’m keeping it simple because I’d like to illustrate a subtlety to variables in LESS that tripped me up at first—but for which I’m very thankful. I’ll give you a hint: LESS treats variables as a stylesheet would, whereas SASS treats them as one would expect in a programming language.

Let’s expand the theme-base.less file with some more default definitions. I’m going to define some other variables in terms of the body color so that themes don’t have to explicitly set all values. Instead, a theme can set a base value and let the base stylesheet calculate derived values. If a calculated value isn’t OK for a theme, the theme can set that value explicitly to override.

Let’s see an example before we continue.

theme-base.less

@title_color: darken(@body_color, 25%);
@border_color: @title_color;

body
{
  background-color: @body_color;
}

h2
{
  color: @title_color;
  border: 1px solid @border_color;
}

You’ll notice that I avoided setting a value for @body_color because I didn’t want to override the value set previously in the theme. But then wouldn’t it be impossible for the theme to override the values for @title_color and @border_color? We seem to have a problem here. [1]

I want to be able to set some values and just use defaults for everything that I don’t want to override. There is a construct in SASS called !default that does exactly this. It indicates that an assignment should only take place if the variable has not yet been assigned. [2] Searching around for an equivalent in LESS took me to this page, Add support for “default” variables (similar to !default in SASS) #1706 (GitHub). There users suggested various solutions and the original poster became ever more adamant—“Suffice it to say that we believe we need default variable setting as we’ve proposed here”—until a LESS developer waded in to state that it would be “a pointless feature in less”, which seemed harsh until an example showed that he was quite right.

The clue is further down in one of the answers:

“If users define overrides after then it works as if it had a default on it. [T]hat’s because even in the imported file it will take the last definition in the same way as css, even if defined after usage. (Emphasis added.)”

It was at this point that the lightbulb went on for me. I was thinking like a programmer where a file is processed top-down and variable values can vary depending on location in the source text. That the output of the following C# code is 12 should amaze no one.

var a = 1;
Console.Write(a);
a = 2;
Console.Write(a);
a = 3;

In fact, we would totally expect our IDE to indicate that the value in the final assignment is never used and can be removed. Using LESS variable semantics, though, where variables are global in scope [3] and assignment are treated as they are in CSS, we would get 33 as output. Why? Because the value of the variable a has the value 3 because that’s the last value assigned to it. That is, LESS has a cascading approach to variable assignment.

This is exactly as the developer from LESS said: stop fighting it and just let LESS do what it does best. Do you want default values? Define the defaults first, then define your override values. The overridden value will be used even when used for setting the value of another default value that you didn’t even override.

Now let’s go fix our stylesheet to use these terse semantics of LESS. Here’s a first cut at a setup that feels pretty right. I put the files in the order that you would read them so that you can see the overridden values and everything makes sense again. [4]

theme-variables.less

@body_color: white;
@title_color: darken(@body_color, 25%);
@border_color: @title_color;

crimson.less

@import "theme-variables";
@body_color: #800;
@import "theme-base";

theme-base.less

body
{
  background-color: @body_color;
}

h2
{
  color: @title_color;
  border: 1px solid @border_color;
}

You can see in the example above that the required variables are all declared, then overridden and then used. From what we learned above, we know that the value of @title_color in the file theme-variables.less will use a value of #800 for @body_color because that was the last value it was assigned.

We can do better though. The example above hasn’t quite embraced the power of LESS fully. Let’s try again.

theme-base.less

@body_color: white;
@title_color: darken(@body_color, 25%);
@border_color: @title_color;

body
{
  background-color: @body_color;
}

h2
{
  color: @title_color;
  border: 1px solid @border_color;
}

crimson.less

@import "theme-base";
@body_color: #800;

Boom! That’s all you have to do. Set up everything in your base stylesheet file. Define all variables and define them in terms of each other in as convoluted a manner as you like. The final value of each value is determined before any CSS is generated.

This final version also has the added advantage that a syntax-checking IDE like JetBrains WebStorm or PHPStorm will be able to provide perfect assistance and validity checking. That wasn’t true at all for any of the previous versions, where variable declarations were in different files.

Although I was seriously considering moving away from LESS and over to SASS—because at least they didn’t leave out such a basic feature, as I had thought crossly to myself—I’m quite happy to have learned this lesson and am more happy with LESS than ever.

[1] For those of you who already know how to fix this, stop smirking. I’m writing this post because it wasn’t intuitive for me—although now I see the utter elegance of it.

[2] I’d also seen the same concept in NAnt property tasks where you can use the now-deprecated overwrite=“false” directive. For the curious, now you’re supposed to use unless=“${property::exists(‘property-name’)}” instead, which is just hideous.

[3] There are exceptions, but “variables are global in LESS is a good rule of thumb”. One example is that if a parameter for a mixin has the same name as a globally assigned variable, the value within that mixin is taken from the parameter rather than the global.

[4] Seriously, LESS experts, stop smirking. I’m taking a long time to get there because a programmer’s intuitive understanding of how variables work is a hard habit to break. Almost there.

Rolling your own languages and frameworks

2014-02-09T23:08:59+01:00

Published by marco on 9. Feb 2014 23:08:59 (GMT-5)

The blog post/article So You Want To Write Your Own Language? by Walter Bright (Dr. Dobbs) contains a lot of interesting information, related to only to parsing, but also to runtime and framework design. Bright is well-known as the designer of the D programming language, so he’s definitely worth a read.

I thought he jumped back and forth between topics a bit, so I summarized the contents for myself below:

Parsing

Bright identifies Minimizing keystrokes, easy parsing and minimizing the number of keywords as false gods. Do not waste any time trying to satisfy these requirements; instead, let them flow naturally from a good design.

Your language should consist of productions that have only a single non-terminal on the left-hand side. That is, strive to make your language context-free. [1] The implication is that you’re actually going to define the grammar rather than just winging it. This means that you can can use a parser generator even though Bright says not to “bother wasting time with lexer or parser generators and other so-called ‘compiler compilers.’”

I instead agree with the article Advice on writing a programming language by Ted Kaminski' (Generic Language), which advises providing a grammar that can be used with parser generators because “many of those people eager to contribute either get stuck trying and failing to build a parser or trying and failing to learn to use the daunting internals of your compiler”.

You can either make it easy for people to build compilers for your language or you can maintain a very friendly API for your own compiler. If you choose the API route, it might force you to be more disciplined, but it might also cause you no end of backwards-compatibility headaches as your compiler quickly evolves. Not only that, but you’d then have to make that API available for any number of languages and any number of platforms.

If you take the route of publishing the BNF, that may also not not be enough. This because it can still be daunting to convert a BNF to something that your compiler-generator can use, especially for non-trivial languages. Providing a grammar for a widely supported parser-generator like ANTLR [2] will give those willing to build tools for your language a good jump-start.

“Use an LR parser generator. It’ll keep your language parsable, and make it easier to change early on. When your language becomes popular enough that you have the problem of parsing errors not being friendly enough for your users, celebrate that fact and hand-roll only then.

“And then reap the benefit of all the tooling an easily parsed language gets, since you know you kept it LR(1).”

Error-handling

Introduce redundancy into the language definition (e.g. semicolons as line-terminators in addition to whitespace/newlines) in order to make error-message generation much easier and much more likely to produce friendly output.

Compilers can handle error messages in different ways:

Bail out on the first error

This is a good fallback, but it saves the developer a lot of work if you identify all of the root errors in source—that is, errors that are not a consequence of another error.

Collect multiple errors

In order to continue parsing/compiling after an error, the machine can take one of two approaches:

“Guess what the programmer intended, repair the syntax trees, and continue.” (Bright) Bad guesses lead to spurious and inscrutable error messages which lead to developers no longer trusting their compilers. Avoid this approach as it is very difficult to get right.
Take the approach that Bright did with the D compiler: consider any part of the code that has an error as “poisoned”. He likens it to the way that “floating-point NaNs are handled. Any operation with a NaN operand silently results in a NaN.” With this approach, “the compiler is able to detect multiple errors as long as the errors are in sections of code with no dependency between them”, which yields only high-quality and relevant error messages for the developer.

Stand on the shoulders of giants

Do not re-invent the syntax for everything in your language. Instead, as Bright says, “[s]ave the divergence for features not generally seen before, which also signals the user that this is new.”

The runtime

A language definition is nothing without a runtime. Bright recommends “taking the common sense approach and using an existing back end, such as the JVM, CLR, gcc, or LLVM. (Of course, I can always set you up with the glorious Digital Mars back end!)” If you can avoid writing your own back-end, you should definitely do so. Similar to the approach recommended for parsing the language: start with a stock runtime and migrate to something custom if the needs of your project warrant it (they almost certainly won’t). This is the approach taken by any number of other popular languages, like Scala.

The framework

And then there’s the library/framework that accompanies the language and, arguably, helps to define it for people. Complaints about a language are often complaints about the standard runtime library/framework for the language. Developers quickly associate them and treat them as one entity. Bright’s focus is on very low-level runtimes (such as the one for his language, D) and thus his advice focuses on fast I/O, fast and efficient memory allocation/de-allocation and robust/fast transcendental functions [3]. However, he also offers the following excellent rule of thumb for any framework:

“My general rule is if the explanation for what the function does is more lines than the implementation code, then the function is likely trivia and should be booted out.”

[1] See Example of why C++ is NOT a context free grammar? by Kaz Kylheku (Velocity Reviews) and Context-free grammar (Wikipedia) for more information.

[2] These are functions that cannot be composed of other functions in the framework, what I would call “core” or “root” functions. These are the functions that the developer would find it either impossible or incredibly difficult to replicate efficiently in the language itself.

[3] As of publication time, the current stable release of ANTLR for C# available via NuGet—and used by most other packages—is version 3.5.1. However, the official home page is now ANTLR rather than ANTLR3. The NuGet packages for version 4 are in pre-release, but it’s very nice to see some progress being made after years of relatively minor upgrades. In particular, the new version of the IDE ANTLRWorks looks much nicer and seems to have been based on the JETBrains IDE framework. I’m definitely looking forward to checking it out in more detail.

Quino: an introduction to query-mapping in the ORM

2014-02-07T09:57:07+01:00

Published by marco on 7. Feb 2014 09:57:07 (GMT-5)

The following article was originally published on the Encodo blogs and is cross-published here.

One of the most-used components of Quino is the ORM. An ORM is an Object-Relational Mapper, which accepts queries and returns data.

Applications formulate queries in Quino using application metadata
The ORM maps this query to the query language of the target database
The ORM transforms the results returned by the database to objects (the classes for which were also generated from application metadata).

This all sounds a bit abstract, so let’s start with a concrete example. Let’s say that we have millions of records in an employee database. We’d like to get some information about that data using our ORM. With millions of records, we have to be a bit careful about how that data is retrieved, but let’s continue with concrete examples.

Attempt #1: Get your data and refine it locally

The following example returns the correct information, but does not satisfy performance or scalability requirements. [1]

var people = Session.GetList().Where(p => p.Company.Name == "IBM");

Assert.That(people.Count(), Is.GreaterThanEqual(140000));

What’s wrong with the statement above? Since the call to Where occurs after the call to GetList(), the restriction cannot possibly have been passed on to the ORM.

The first line of code doesn’t actually execute anything. It’s in the call to Count() that the ORM and LINQ are called into action. Here’s what happens, though:

For each row in the Person table, create a Person object
For each person object, create a corresponding Company object
Count all people where the Name of the person’s company is equal to “IBM”.

The code above benefits from almost no optimization, instantiating a tremendous number of objects in order to yield a scalar result. The only side-effect that can be considered an optimization is that most of the related Company objects will be retrieved from cache rather than from the database. So that’s a plus.

Still, the garbage collector is going to be running pretty hot and the database is going to see far more queries than necessary. [3]

Attempt #2: Refine results on the database

Let’s try again, using Quino’s fluent querying API. [4] The Quino ORM can map much of this API to SQL. Anything that is mapped to the database is not performed locally and is, by definition, more efficient. [5]

var people = Session.GetList();
people.Query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM"); [6]

Assert.That(people.Count, Is.GreaterThanEqual(140000));

First, we get a list of people from the Session. As of the first line, we haven’t actually gotten any data into memory yet—we’ve only created a container for results of a certain type (Person in this case).

The default query for the list we created is to retrieve everything without restriction, as we saw in the first example. In this example, though, we restrict the Query to only the people that work for a company called “IBM”. At this point, we still haven’t called the database.

The final line is the first point at which data is requested, so that’s where the database is called. We ask the list for the number of entries that match it and it returns an impressive number of employees.

At this point, things look pretty good. In older versions of Quino, this code would already have been sufficiently optimized. It results in a single call to the database that returns a single scalar value with everything calculated on the database. Perfect.

Attempt #3: Avoid creating objects at all

However, since v1.6.0 of Quino [7], the call to the property IDataList.Count has automatically populated the list with all matching objects as well. We made this change because the following code pattern was pretty common:

var list = Session.GetList();
// Adjust query here
if (list.Count > 0)
{
  // do something with all of the objects here
}

That kind of code resulted in not one, but two calls to the database, which was killing performance, especially in high-latency environments.

That means, however, that the previous example is still going to pull 14,000 objects into memory, all just to count them and add them to a list that we’re going to ignore. The garbage collector isn’t a white-hot glowing mess anymore, but it’s still throwing you a look of disapproval.

Since we know that we don’t want the objects in this case, we can get the old behavior back by making the following adjustment.

var people = Session.GetList();
people.Query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(people.Query), Is.GreaterThanEqual(140000));

It would be even clearer to just forget about creating a list at all and work only with the query instead.

var query = Session.GetQuery();
query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(query), Is.GreaterThanEqual(140000));

Now that’s a maximally efficient request for a number of people in Quino 1.10 as well.

Tune in next time for a look at what happens when a query can only be partially mapped to the database.

[1] I suppose it depends on what those requirements are, but if you think your application’s performance requirements are so loose that it’s OK to create millions of objects in memory just in order to count them, then you’re probably not in the target audience for this article.

[2] Instead of using a code-first, Quino uses a “metadata-first” approach, so the Person class used here is generated from the application metadata rather than written by the developer, as in other frameworks.

[3]

There are different strategies for retrieving associated data. Quino does not yet support retrieving anything other than root objects. That is, the associated Company object is not retrieved in the same query as the Person object.

In the example in question, the first indication that the ORM has that a Company is required is when the lambda retrieves them individually. Even if the original query had somehow indicated that the Company objects were also desired (e.g. using something like Include(Person.Relations.Company) as you would in EF), the most optimal mapping strategy is still not clear.

Should the mapper join the company table and retrieve that highly redundant data with each person? Or should it execute a single query for all companies and prime a cache with those? The right answer depends on the latency and bandwidth between the ORM and the database as well as myriad other conditions. When dealing with a lot of data, it’s not hard to find examples where the default behavior of even a clever ORM isn’t maximally efficient—or even very efficient at all.

As we already noted, though, the example in question does everything in memory. If we reasonably assume that the people belong to a relatively small number of companies—say qc—then the millions of calls to retrieve companies associated with people will result in a lot of cache hits and generate “only” qc + 1 queries.

[4] Quino does not have LINQ to SQL support. I’m not even going to write “yet” at the end of that sentence because it’s not at all clear that we’re ever going to have it. Popular demand might convince us otherwise, but for now we’re quite happy with our API (and soon-to-be-revealed query language QQL).

[5] That’s an assumption I’m going to make for which counterexamples certainly exist, but none of which apply to the simple examples we’ll address in this article.

[6] The Person.Relations and Person.Fields static fields are generated with the Person class. These correspond to the application metadata and change when the metadata changes. Developers are encouraged to use these generated constants so that even metadata-based queries can be validated by the compiler.

[7] That was in almost three years ago, in June of 2011.

Generating JSON from Dart object graphs

2014-02-01T17:09:41+01:00

Published by marco on 1. Feb 2014 17:09:41 (GMT-5)

A while back, I participated in an evaluation of languages that could replace JavaScript for our web front-end development language at Encodo. We took a look at two contenders: Dart and TypeScript. At the time, Dart was weaker for the following reasons:

It had not yet been released
It had little to no tool support
Integration with existing JS libraries was somewhat laborious

Though TypeScript has its weaknesses (it has technically not yet hit a 1.0 release), we eventually decided to go in that direction. The tool support in Visual Studio and ReSharper are both improving steadily and have gotten quite good. We’ve had quite positive results in one larger project.

Even with Dart in our wake, I am still curious to see how people are using it. I was surprised by the claims in the article Why Dart should learn JSON while it’s still young by Max Horstmann.

Since Dart is not directly compatible with JavaScript, as TypeScript is, neither can a given JSON-formatted string be implicitly interpreted. Instead, you can import it using a library function. This is not really a problem, though one wonders if there are performance penalties for Dart that are not present in JavaScript/TypeScript.

Where the problem arises is in exporting JSON, which does not happen automagically. In non–client-side languages like C#, NewtonSoft’s JSON.Net library can serialize pretty much anything using reflection. JSON isn’t baked into the language, but that isn’t too surprising. However, in Dart, positioned as a contender for taking over from JavaScript as the client-side language of choice, the solution recommended even by Dart language gurus is to implement toJson() on all objects that you want to export.

Either that, or use a probably non-optimized external library to serialize your object to JSON (likely using introspection, as JSON.Net does). I agree with the author of the blog that this is a red flag for using Dart in production projects. It’s strange that Dart doesn’t produce JSON without relying on external libraries. And the recommended library is, as of this writing, of pre-production/alpha quality—the version number is 0.1.0 and the TODO list includes a bullet point that exhorts the author to “Write tests!”.

So I’m still waiting to see what becomes of Dart, but the balkiness of the current solution for generating JSON not only makes it a bit of tough fit for many current web applications, but also makes us urge caution despite its having recently been released (1.0 came out in November 2013).

The Ruby language: where you can randomize your base class

2014-02-01T12:38:27+01:00

Published by marco on 1. Feb 2014 12:38:27 (GMT-5)

I have never really examined Ruby in detail but it seems to be even more of a treasure-trove of ad-hoc features than PHP.

It takes full advantage of being evaluated at run-time to offer features that I haven’t seen in even other dynamic languages. Some of these features seem like they might be nice shortcuts but also seem like they would be difficult to optimize. Not only that, but they seem so obscure that they would likely will trip up even more seasoned users of the language.

At any rate, the one I found to be most brash was methods in class definitions by bjeanes (StackOverflow). (The article is a treasure trove of other gems, no pun intended.)

The example below shows how that might work.

class RandomSubclass < [Array, Hash, String, Fixnum, Float, TrueClass].sample

end

RandomSubclass.superclass # could output one of 6 different classes.

The language allows you to call methods from the “extends” clause. The example above creates an array of class names, then calls the sample method on them to yield a base class. The actual base class is not only unknown at compile time, it is also unpredictable at runtime.

The example above is contrived and makes the feature seem like it’s only for the reckless. It’s clear that serious software would have to forbid or strictly limit the use of such a feature, but I can see where it would be useful.

For example, you may want to change your base class depending on deployment parameters. If you’re deploying to a testing or staging environment, you’ll use a base class that includes more logging, profiling and debugging code. For production, you switch to a base class that’s optimized. If the class interface remains the same, then using this feature wouldn’t be as dangerous as it initially appeared.

Still, ensuring quality and enforcing architecture in software written in such a language would require a strict development process and discipline and vigilance from all involved.

Apple Developer Videos

2014-01-05T11:46:53+01:00

Published by marco on 5. Jan 2014 11:46:53 (GMT-5)

It’s well-known that Apple runs a walled garden. Apple makes its developers pay a yearly fee to get access to that garden. In fairness, though, they do provide some seriously nice-looking APIs for their iOS and OS X platforms. They’ve been doing this for years, as listed in the post iOS 7 only is the only sane thing to do by Tal Bereznitskey. It argues that the new stuff in iOS 7 is compelling enough to make developers consider dropping support for all older operating systems. And this for pragmatic reasons, such as having far less of your own code to support and correspondingly making the product cost less to support. It’s best to check your actual target market, but Apple users tend to upgrade very quickly and reliably, so an iOS 7-only strategy is a good option.

Among the improvements that Apple has brought in the recent past are blocks (lambdas), GCD (asynchronous execution management) and ARC (mostly automated memory management), all introduced in iOS 4 and OS X 10.6 Snow Leopard. OS X 10.9 Mavericks and iOS 7 introduced a slew of common UI improvements (e.g. AutoLayout and HTML strings for labels). [1]

To find the videos listed below, browse to WWDC 2013 Development Videos.

For the web, Apple has improved developer tools and support in Safari considerably. There are two pretty good videos demonstrating a lot of these improvements:

#601: Getting to Know Web Inspector: This video shows a lot of improvements to Safari 7 debugging, in the form of a much more fluid and intuitive Web Inspector and the ability to save changes made there directly back to local sources.
#603: Getting the Most Out of Web Inspector: This video shows how to use the performance monitoring and analysis tools in Safari 7. The demonstration of how to optimize rendering and compositing layers was really interesting.

For non-web development, Apple has been steadily introducing libraries to provide support for common application tasks, the most interesting of which are related to UI APIs like Core Image, Core Video, Core Animation, etc.

Building on top of these, Apple presents the Sprite Kit—for building 2D animated user interfaces and games—and the Scene Kit—for building 3D animated user interfaces and games. There are some good videos demonstrating these APIs as well.

#500: What’s New in Scene Kit: An excellent presentation content-wise; the heavily accented English is sometimes a bit difficult to follow, but the material is top-notch.
#502: Introduction to Sprite Kit: This is a good introduction to nodes, textures, actions, physics and the pretty nice game engine that Apple delivers for 2D games.
#503: Designing Games with Sprite Kit: The first half is coverage of tools and assets management along with more advanced techniques. The second half is with game designers Graeme Devine [2] and Spencer Lindsay, who designed the full-fledged online multi-player game Adventure to showcase the Sprite Kit.

[1] Disclaimer: I work with C# for Windows and HTML5 applications of all stripes. I don’t actually work with any of these technologies that I listed above. The stuff looks fascinating, though and, as a framework developer, I’m impressed by the apparent cohesiveness of their APIs. Take recommendations with a grain of salt; it could very well be that things are a good deal less rosy when you actually have to work with these technologies.

[2] Formerly of Trilobyte and then id Software, now at Apple.

Brilliant articles by the funniest guy at Microsoft

2013-12-29T23:09:53+01:00

Published by marco on 29. Dec 2013 23:09:53 (GMT-5)

I recently stumbled upon some Essays from the funniest man in Microsoft Research by Raymond (Old New Thing). He is such a funny writer that this article, against convention, will consist mostly of citations rather than an even mix of citations and paraphrasing that I naturally consider to be much more lucid and pithy. I quote at length to do the material justice, for documentation and to ensure that you all download the PDFs to see if there is more where that came from (there is). All emphases have been added.

Mobile Computing Research Is a Hornet’s Nest of Deception and Chicanery by James Mickens (Microsoft Research) (PDF): On the delusions of the mobile-computing world:

“Mobile computing researchers are a special kind of menace. They don’t smuggle rockets to Hezbollah, or clone baby seals and then make them work in sweatshops for pennies a day. That’s not the problem with mobile computing people. The problem with mobile computing people is that they have no shame. They write research papers with titles like “Crowdsourced Geolocation-based Energy Profiling for Mobile Devices,” as if the most urgent deficiency of smartphones is an insufficient composition of buzzwords.”
On browsing web pages:

“When I use a mobile browser to load a web page, I literally have no expectation that anything will ever happen. A successful page load is so unlikely, so unimaginable, that mobile browsers effectively exist outside of causality—the browser is completely divorced from all action verbs, and can only be associated with sad, falling-tone sentences like “I had to give up after twenty seconds.” ”
On the fragility of touchscreens:

“Note that, when I say that you will “drop” your touchscreen, I do not mean “drop” in the layperson sense of “to release from a non-trivial height onto a hard surface.” I mean “drop” in the sense of “to place your touchscreen on any surface that isn’t composed of angel feathers and the dreams of earnest schoolchildren.” Phones and tablets apparently require Planck-scale mechanical alignments, such that merely looking at the touchscreen introduces fundamental, quantum dynamical changes in the touchscreen’s dilithium crystals. Thus, if you place your touchscreen on anything, ever, you have made a severe and irreversible life mistake.”
On the sheer touchiness of touchscreens:

“On your touchscreen, your swipes will become pinches, and your pinches will become scrolls, and each one of your scrolls will become a complex thing never before seen on this earth, a leviathan meta-touch event of such breadth and complexity that your phone can only respond like Carrie White at the prom. So, your phone just starts doing stuff, all the stuff that it knows how to do, and it’s just going nuts, and your apps are closing and opening and talking to the cloud and configuring themselves in unnatural ways, and your phone starts vibrating and rumbling with its little rumble pack, and it will gently sing like a tiny hummingbird of hate, and you’ll look at the touchscreen, and you’ll see that things are happening, my god, there are so many happenings, and you’ll try to flip the phone over and take out the battery, because now you just want to kill it and move to Kansas and start over, […]”
On the uselessness of most mobile computing:

“When you purchase a mobile device, you are basically saying, “I endorse the operational inefficiency of the modern bourgeoisie lifestyle, even though I could find a rock and tie a coat hanger around it and have a better chance of having a phone conversation that doesn’t sound like two monsters arguing about German poetry.””
The Slow Winter by James Mickens (Microsoft Research) (PDF): On flying in the early 21st century:

“The point is that flying in airplanes used to be fun, but now it resembles a dystopian bin-packing problem in which humans, carry-on luggage, and five dollar peanut bags compete for real estate while crying children materialize from the ether and make obscure demands in unintelligible, Wookie-like languages while you fantasize about who you won’t be helping when the oxygen masks descend.”
On how awesome it was being a hardware architect before things got all quantum and messy:

“Of course, pride precedes the fall, and at some point, you realize that to implement aggressive out-of-order execution, you need to fit more transistors into the same die size, but then a material science guy pops out of a birthday cake and says YEAH WE CAN DO THAT, and by now, you’re touring with Aerosmith and throwing Matisse paintings from hotel room windows, because when you order two Matisse paintings from room service and you get three, that equation is going to be balanced. It all goes so well, and the party keeps getting better. When you retire in 2003, your face is wrinkled from all of the smiles, and even though you’ve been sued by several pedestrians who suddenly acquired rare paintings as hats, you go out on top, the master of your domain. ”
On quantum-level effects in modern processors:

“They randomly switched states; they leaked voltage; they fell prey to the seductive whims of cosmic rays that, unlike the cosmic rays in comic books, did not turn you into a superhero, but instead made your transistors unreliable and shiftless, like a surly teenager who is told to clean his room and who will occasionally just spray his bed with Lysol and declare victory.”
On scaling in cores when processor speed and more transistors became too messy:

“John did what any reasonable person would do: he cloaked himself in a wall of denial and acted like nothing had happened. “Making processors faster is increasingly difficult,” John thought, “but maybe people won’t notice if I give them more processors.” This, of course, was a variant of the notorious Zubotov Gambit, named after the Soviet-era car manufacturer who abandoned its attempts to make its cars not explode, and instead offered customers two Zubotovs for the price of one […]”
On the main purpose that people have for their computers:

“Lay people use their computers for precisely ten things, none of which involve massive computational parallelism, and seven of which involve procuring a vast menagerie of pornographic data and then curating that data using a variety of fairly obvious management techniques, like the creation of a folder called “Work Stuff,” which contains an inner folder called “More Work Stuff,” where “More Work Stuff” contains a series of ostensible documentaries that describe the economic interactions between people who don’t have enough money to pay for pizza and people who aren’t too bothered by that fact. ”
A summary of the state of the world of hardware design and development:

“[…] you brought the fire down from Olympus, and the mortals do with it what they will. But now, all the easy giants were dead, and John was left to fight the ghosts that Schrödinger had left behind.”
The Night Watch by James Mickens (Microsoft Research) (PDF): What it’s like to be a systems (low-level) programmer:

“A systems programmer will know what to do when society breaks down, because the systems programmer already lives in a world without law.”
On why people still use C++ (or a response to the snotty question of: “why don’t you just use high-level language X instead?”)

“Why not use a modern language with garbage collection and functional programming and free massages after lunch? Here’s the answer: Pointers are real. They’re what the hardware understands. Somebody has to deal with them. You can’t just place a LISP book on top of an x86 chip and hope that the hardware learns about lambda calculus by osmosis. […] Pointers are like […] real, living things that must be dealt with so that polite society can exist. Make no mistake, I don’t want to write systems software in a language like C++. […] When it’s 3 A.M., and you’ve been debugging for 12 hours, and you encounter a virtual static friend protected volatile templated function pointer, you want to […] find the people who wrote the C++ standard and bring ruin to the things that they love.”
On being thankful for systems programmers:

“That being said, if you find yourself drinking a martini and writing programs in garbage-collected, object-oriented Esperanto, be aware that the only reason that the Esperanto runtime works is because there are systems people who have exchanged any hope of losing their virginity for the exciting opportunity to think about hex numbers and their relationships with the operating system, the hardware, and ancient blood rituals that Bjarne Stroustrup performed at Stonehenge.”
On how difficult it is to work in extremely fragile territory (rather than a safe runtime):

“Indeed, I would [have…checked the log files for errors] if I hadn’t broken every component that a logging system needs to log data. I have a network file system, and I have broken the network, and I have broken the file system, and my machines crash when I make eye contact with them. I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS.”
A backhanded swipe at the utter uselessness of many UI concerns:

“I’m glad that people are working on new kinds of bouncing icons because they believe that humanity has solved cancer and homelessness and now lives in a consequence-free world of immersive sprites.”

How to fool people into giving up their email address

2013-11-07T20:56:06+01:00

Published by marco on 7. Nov 2013 20:56:06 (GMT-5)

On Codecademy, you can learn to program in various languages. It starts off very slowly and is targeted at non-technical users. That’s their claim anyway—the material in the courses I looked at ramps up pretty quickly.

Anyway, the interesting thing I saw was in their introductory test. It struck me as a subtle way to get you to enter your email address. I’d just recently discussed this on a project I’m working on: how can we make it fun for the user to enter personal information? The goal is not to sell that information (not yet anyway, but who knows what the future holds), but to be able to enhance—nay, personalize—the service.

Personalizing has a bad reputation but can be very beneficial. For example, if you’re using a site for free and you’re going to see offers and advertisements anyway, isn’t it better to enter a bit of data that will increase the likelihood that offers and ads are interesting? Each person can—and should—decide for the themselves what to make public, but the answer isn’t always necessarily no.

How Codecademy gets your email

First prompt

Here they teach you how to use the “length” method by measuring your email address. Sneaky. I like it.

Second prompt

Even if you don’t given them an address, they re-prompt you to enter your email, but it doesn’t come across as pushy because you’re taking a test.

I thought that this was pretty subtle. Because of the context, people who would ordinarily be sensitive to giving up their email might not even notice. Why? Because they want to answer the question *correctly*. They don’t want the site to judge them for having entered something wrong, so they do as they’re told.

Is Codecademy collecting emails this way? I have no way to be sure, but they’d be silly not to.

The HTML5 AppCache and HTTP Authentication

2013-11-03T11:17:36+01:00

Published by marco on 3. Nov 2013 11:17:36 (GMT-5)

The following article was originally published on the Encodo blogs and is cross-published here.

The following article outlines a solution to what may end up being a temporary problem. The conditions are very specific: no server-side logic; HTTP authentication; AppCache as it is implemented by the target platforms—Safari Mobile and Google Chrome—in late 2012/early 2013. The solution is not perfect but it’s workable. We’re sharing it here in the hope that it can help someone else or serve as a base for a better solution.

The HTML5 AppCache

The application cache is a relatively new feature that is,

Supported by all modern browsers
Uses a manifest file that indicates which files to cache
Browser checks manifest for changes
If there are changes, all files are refreshed
External links work when online
When offline, the application works with the local cache
External links to non-cached content are redirected to fallback links

AppCache Limitations

Web applications can use the HTML5 application-cache to store local content, but different browsers apply different restrictions to the amount of space allocated per domain.

Safari Mobile is limited to 50MB per domain. This means that the restriction will generally apply to all content packages downloaded from the same server/domain
Google Chrome is limited as well, but the actual limit is a bit of a moving target

Optimizing the HTML5 AppCache for Authenticated Content

In particular, the Safari Mobile browser cannot update the application cache for files for which it must obtain authentication.

Some requests do not trigger authentication
- Manifest file
- Home-screen icons
A lost connection or timeout can invalidate the authentication token
Version checks are not reliable
- Open pages/running apps do not check for status updates
- Home-screen apps don’t reliably check on startup
- This can lead to out-of-date or missing content

Checking for and presenting updates to the user

The graphic below illustrates the mechanism by which a content package in a web application can manage content updates and present them to the user.

When online, the software regularly checks whether an update for the package is available
The user can determine whether to install an update
When an update has been found, the software stops checking for updates until the user has applied the latest update
If the user delays the update, the user interface displays an “update” button
The software will automatically start checking for updates whenever it detects that it is online
There is no way for a user to ignore updates
When the user proceeds with an update, the latest version is retrieved at that time, ensuring that the user has the latest version

Activity Diagram for Software Update Cycle

Solving the problems with authenticated data and the AppCache

In order to address the problems described above, the UA products use a separate version file to check for updates independent of the browser’s application-cache mechanism and to trigger this update only when authentication has been reestablished.

Sequence Diagram for Software Update Cycle

The cache.version.txt file is publicly available but is very small and includes only a unique version number that is also included in the cache.manifest file (both of which are generated by a deployment script).
The software compares this version number against the last known good version number. If it differs, it knows that the server has been updated with new content for this package
Before the software can kick off the HTML5 AppCache update process, it must ensure that the user is authenticated and authorized to retrieve the update package (because most browsers will simply fail silently if this is not the case).
The software pulls the force.password.txt file from the private zone with an explicit request. The browser will ask the user to authenticate, if necessary. This file is also very small to avoid needlessly downloading a large amount of data simply to force re-authentication.
Once the user has authenticated, the software lets the automated HTML5 AppCache update take over, retrieving first the cache.manifest file and then updating files as needed. The user is notified that this download is taking place asynchronously.
The software receives a notification from the browser that the update is complete and can record the version number and then notify the user that the update has been applied and is ready to use.

This approach worked relatively well for us, although we continue to refine it based on feedback and experience.

Entity Framework Generated SQL

2013-10-21T22:56:04+02:00

Published by marco on 21. Oct 2013 22:56:04 (GMT-5)

Updated by marco on 12. Jun 2018 20:06:15 (GMT-5)

Microsoft just recently released Visual Studio 2013, which includes Entity Framework 6 and introduces a lot of new features. It reminded me of the following query that EF generated for me, way, way back when it was still version 3.5. Here’s hoping that they’ve taken care of this problem since then.

So, the other day EF (v3.5) seemed to be taking quite a while to execute a query on SQL Server. This was a pretty central query and involved a few joins and restrictions, but wasn’t anything too wild. All of the restrictions and joins were on numeric fields backed by indexes.

In these cases, it’s always best to just fire up the profiler and see what kind of SQL is being generated by EF. It was a pretty scary thing (I’ve lost it unfortunately), but I did manage to take a screenshot of the query plan, shown below.

EF Query Plan

It doesn’t look too bad until you notice that the inset on the bottom right (the black smudgy thing) is a representation of the entire query … and that it just kept going on down the page.

Ignoring files with Git

2013-07-15T01:38:50+02:00

Published by marco on 15. Jul 2013 01:38:50 (GMT-5)

The helpful page, Ignoring files (GitHub), taught me something I didn’t know: there’s a file you can use to ignore files in your local Git repository without changing anyone else’s repository.

Just to recap, here are the ways to ignore a file:

Global .gitignore: you can designate basic exclusion directives that apply to all repositories on your system. This file is not committed to any repository or shared with others. Execute git config –global core.excludesfile ~/.gitignore_global to set the file to ~/.gitignore_global (for example). See the linked article for sample directives.
Per-repository global exclusions: add directives to the .git/info/exclude file in any repository. These directives are combined with any system-global directives to form the base exclusions for that repository. This file is not committed with the repository. This is the one I’d never heard of before.
.gitignore: add a file with this name to any directory. The directives in that file are merged with those from the parent directory to define the patterns that are excluded in that directory and all child directories. This is definitely the most common way to exclude files.
Exclude versioned files: and, finally, if your repository has files that are changed but not committed (e.g. configuration files), you can ignore future changes to those files with a call to git update-index –assume-unchanged path/to/file.txt. While this can be useful for legacy projects, it’s best to structure new projects so developers don’t have to rely on easily forgotten tricks like this.

Some new CSS length units (and some lesser-known ones)

2013-07-14T23:17:53+02:00

Published by marco on 14. Jul 2013 23:17:53 (GMT-5)

I’ve been using CSS since its inception and use many parts of the CSS3 specification for both personal work and work I do for Encodo. Recently, I read about some length units I’d never heard of in the article CSS viewport units: vw, vh, vmin and vmax by Chris Mills (Dev.Opera).

1vw: 1% of viewport width
1vh: 1% of viewport height
1vmin: 1vw or 1vh, whatever is smallest
1vmax: 1vw or 1vh, whatever is largest

These should be eminently useful for responsive designs. While there is wide support for these new units, that support is only available in the absolute latest versions of browsers. See the article for a good example of how these can be used.

While the ones covered in the article are actually new, there are others that have existed for a while but that I’ve never had occasion to use. The Font-relative lengths: the ‘em’, ‘ex’, ‘ch’, ‘rem’ units (CSS Values and Units Module Level 3) section lists the following units:

em: This one is well-known: 1em is equal to the “computed value of the ‘font-size’ property of the element on which it is used.”
ex: Equal to the height of the letter ‘x’ in the font of the element on which it is used. This is useful when you want to size a container based on the height of a lower-case letter—i.e. tighter—rather than on the full size of the font (as you get with em).
ch: “Equal to the advance measure of the “0” (ZERO, U+0030) glyph found in the font used to render it.” Since all digits in a font should be the same width, this unit is probably useful for pages that need to measure and render numbers in a reliable vertical alignment.
rem: The same as em but always returns the value for the root element of the page rather than the current element. Elements that use this unit will all scale against a common size, independently of the font-size of their contents. There’s more to the CSS rem unit than font sizing by Roman Rudenko (CSS-Tricks) has a lot more information and examples, as well as an explanation of how rem can stand in for the still nascent support for vw.

.NET 4.5.1 and Visual Studio 2013 previews are available

2013-06-29T17:00:15+02:00

Published by marco on 29. Jun 2013 17:00:15 (GMT-5)

Updated by marco on 29. Jun 2013 17:04:02 (GMT-5)

The following article was originally published on the Encodo blogs and is cross-published here.

The article Announcing the .NET Framework 4.5.1 Preview provides an incredible amount of detail about a relatively exciting list of improvements for .NET developers.

x64 Edit & Continue

First and foremost, the Edit-and-Continue feature is now available for x64 builds as well as x86 builds. Whereas an appropriate cynical reaction is that “it’s about damn time they got that done”, another appropriate reaction is to just be happy that they will finally support x64-debugging as a first-class feature in Visual Studio 2013.

Now that they have feature-parity for all build types, they can move on to other issues in the debugger (see the list of suggestions at the end).

Async-aware debugging

We haven’t had much opportunity to experience the drawbacks of the current debugger vis à vis asynchronous debugging, but the experience outlined in the call-stack screenshot below is one that is familiar to anyone who’s done multi-threaded (or multi-fiber, etc.) programming.

Regular Call stack (VS2012)

Instead of showing the actual stack location in the thread within which the asynchronous operation is being executed, the new and improved version of the debugger shows a higher-level interpretation that places the current execution point within the context of the asnyc operation. This is much more in keeping with the philosophy of the async/await feature in .NET 4.5, which lets developers write asynchronous code in what appears to be a serial fashion. This improved readability has been translated to the debugger now, as well.

Async-aware Call stack (VS2013)

Return-value inspection

The VS2013 debugger can now show the “direct return values and the values of embedded methods (the arguments)” for the current line. [1] Instead of manually selecting the text segment and using the Quick Watch window, you can now just see the chain of values in the “Autos” debugger pane.

Automatic return-value inspection

Nuget Improvements

“We are also releasing an update in Visual Studio 2013 Preview to provide better support for apps that indirectly depend on multiple versions of a single NuGet package. You can think of this as sane NuGet library versioning for desktop apps.”

We’ve been bitten by the afore-mentioned issue and are hopeful that the solution in Visual Studio 2013 will fill the gaps in the current release. The article describes several other improvements to the Nuget services, including integration with Windows Update for large-scale deployment. They also mentioned “a curated list of Microsoft .NET Framework NuGet Packages to help you discover these releases, published in OData format on the NuGet site”, but don’t mention whether the Nuget UI in VS2013 has been improved. The current UI, while not as awful and slow as initial versions, is still not very good for discovery and is quite clumsy for installation and maintenance.

User Voice for Visual Studio/.NET

You’re not limited to just waiting on the sidelines to see which feature Microsoft has decided to implement in the latest version of .NET/Visual Studio. You should head over to the User Voice for Visual Studio site to get an account and vote for the issues you’d like the to work on next.

Here’s a list of the ones I found interesting, and some of which I’ve voted on.

Support Edit & Continue for all method bodies: While it is, to some degree, understandable that methods, fields and constants cannot be added or removed easily without restarting the debugger, method bodies can be modified. Unless they include lambdas or generics. Experience has shown that this means that we can’t use Edit & Continue for most of our code. VS needs to allow the modification of lambdas and anonymous methods or the even higher-rated issue, Debug Lambda expressions. The other missing piece of this puzzle is to add Support for generics under Edit & Continue.
Better Generics and Design-by-Contract support: Language feature requests include Expand Generic Constraints for constructors, allow [p]roper (generic) type ali[a]sing and, near and dear to my heart, Integrate Code Contracts more deeply in the .NET Framework or Integrate Code Contract Keywords into the main .Net Languages.
There seem to be a lot of people asking Microsoft to work on Silverlight 6. Having developed several applications in Silverlight 5, we wouldn’t be averse to seeing continued support rather than obsolescence for those projects. [2]
Make Visual Studio run faster and leaner: There are also pleas to Decrease the Memory Footprint (of Visual Studio) or Make Visual Studio startup and shutdown faster and Make [Visual Studio] Solutions Load Faster or at least Improve performance of Visual Studio builds and other uses better when working with solutions with several projects (ex. 40+). Other ideas in this vein are add a Visual Studio Task Manager so we can at least see which components, plugins and panes are causing trouble. If all else fails, just [m]ake VS scalable by switching to 64 bit.

[1] In a similar vein, I found the issue Bring back Classic Visual Basic, an improved version of VB6 to be interesting, simply because of the large number of votes for it (1712 at the time of writing). While it’s understandable that VB6 developers don’t understand the programming paradigm that came with the transition to .NET, the utterly reactionary desire to go back to VB6 is somewhat unfathomable. It’s 2013, you can’t put the dynamic/lambda/jitted genie back in the bottle. If you can’t run with the big dogs, you’ll have to stay on the porch…and stop being a developer. There isn’t really any room for software written in a glorified batch language anymore.

[2] This feature has been available for the unmanaged-code debugger (read: C++) for a while now.

Deleting multiple objects in Entity Framework

2013-06-08T09:43:11+02:00

Published by marco on 8. Jun 2013 09:43:11 (GMT-5)

Updated by marco on 9. Jun 2013 09:38:28 (GMT-5)

The following article was originally published on the Encodo blogs and is cross-published here.

Many improvements have been made to Microsoft’s Entity Framework (EF) since I last used it in production code. In fact, we’d last used it waaaaaay back in 2008 and 2009 when EF had just been released. Instead of EF, I’ve been using the Quino ORM whenever I can.

However, I’ve recently started working on a project where EF5 is used (EF6 is in the late stages of release, but is no longer generally available for production use). Though I’d been following the latest EF developments via the ADO.Net blog, I finally had a good excuse to become more familiar with the latest version with some hands-on experience.

Our history with EF

Entity Framework: Be Prepared was the first article I wrote about working with EF. It’s quite long and documents the pain of using a 1.0 product from Microsoft. That version support only a database-first approach, the designer was slow and the ORM SQL-mapper was quite primitive. Most of the tips and advice in the linked article, while perhaps amusing, are no longer necessary (especially if you’re using the Code-first approach, which is highly recommended).

Our next update, The Dark Side of Entity Framework: Mapping Enumerated Associations, discusses a very specific issue related to mapping enumerated types in an entity model (something that Quino does very well). This shortcoming in EF has also been addressed but I haven’t had a chance to test it yet.

Our final article was on performance, Pre-generating Entity Framework (EF) Views, which, while still pertinent, no longer needs to be done manually (there’s an Entity Framework Power Tools extension for that now).

So let’s just assume that that was the old EF; what’s the latest and greatest version like?

Well, as you may have suspected, you’re not going to get an article about Code-first or database migrations. [1] While a lot of things have been fixed and streamlined to be not only much more intuitive but also work much more smoothly, there are still a few operations that aren’t so intuitive (or that aren’t supported by EF yet).

Standard way to delete objects

One such operation is deleting multiple objects in the database. It’s not that it’s not possible, but that the only solution that immediately appears is to,

load the objects to delete into memory,
then remove these objects from the context
and finally save changes to the context, which will remove them from the database

The following code illustrates this pattern for a hypothetical list of users.

var users = context.Users.Where(u => u.Name == "John");

foreach (var u in users)
{
  context.Users.Remove(u);
}

context.SaveChanges();

This seems somewhat roundabout and quite inefficient. [2]

Support for batch deletes?

While the method above is fine for deleting a small number of objects—and is quite useful when removing different types of objects from various collections—it’s not very useful for a large number of objects. Retrieving objects into memory only to delete them is neither intuitive nor logical.

The question is: is there a way to tell EF to delete objects based on a query from the database?

I found an example attached as an answer to the post Simple delete query using EF Code First (Stack Overflow). The gist of it is shown below.

context.Database.SqlQuery(
  "DELETE FROM Users WHERE Name = @name",
  new [] { new SqlParameter("@name", "John") }
);

To be clear right from the start, using ESQL is already sub-optimal because the identifiers are not statically checked. This query will cause a run-time error if the model changes so that the “Users” table no longer exists or the “Name” column no longer exists or is no longer a string.

Since I hadn’t found anything else more promising, though, I continued with this approach, aware that it might not be usable as a pattern because of the compile-time trade-off.

Although the answer had four up-votes, it is not clear that either the author or any of his fans have actually tried to execute the code. The code above returns an IEnumerable but doesn’t actually do anything.

After I’d realized this, I went to MSDN for more information on the SqlQuery method. The documentation is not encouraging for our purposes (still trying to delete objects without first loading them), as it describes the method as follows (emphasis added),

“Creates a raw SQL query that will return elements of the given generic type. The type can be any type that has properties that match the names of the columns returned from the query, or can be a simple primitive type.”

This does not bode well for deleting objects using this method. Creating an enumerable does very little, though. In order to actually execute the query, you have to evaluate it.

Die Hoffnung stirbt zuletzt [3] as we like to say on this side of the pond, so I tried evaluating the enumerable. A foreach should do the trick.

var users = context.Database.SqlQuery(
  "DELETE FROM Users WHERE Name = @name", 
  new [] { new SqlParameter("@name", "John") }
);

foreach (var u in users)
{
  // NOP?
}

As indicated by the “NOP?” comment, it’s unclear what one should actually do in this loop because the query already includes the command to delete the selected objects.

Our hopes are finally extinguished with the following error message:

System.Data.EntityCommandExecutionException : The data reader is incompatible with the specified ‘Demo.User’. A member of the type, ‘Id’, does not have a corresponding column in the data reader with the same name.

That this approach does not work is actually a relief because it would have been far too obtuse and confusing to use in production.

It turns out that the SqlQuery only works with SELECT statements, as was strongly implied by the documentation.

var users = context.Database.SqlQuery(
  "SELECT * FROM Users WHERE Name = @name",
  new [] { new SqlParameter("@name", "John") }
);

Once we’ve converted to this syntax, though, we can just use the much clearer and compile-time–checked version that we started with, repeated below.

var users = context.Users.Where(u => u.Name == "John");

foreach (var u in users)
{
  context.Users.Remove(u);
}

context.SaveChanges();

So we’re back where we started, but perhaps a little wiser for having tried.

Deleting objects with Quino

As a final footnote, I just want to point out how you would perform multiple deletes with the Quino ORM. It’s quite simple, really. Any query that you can use to select objects you can also use to delete objects [4].

So, how would I execute the query above in Quino?

Session.Delete(Session.CreateQuery().WhereEquals(User.MetaProperties.Name, "John").Query);

To make it a little clearer instead of showing off with a one-liner:

var query = Session.CreateQuery();
query.WhereEquals(User.MetaProperties.Name, "John");
Session.Delete(query);

Quino doesn’t support using Linq to create queries, but its query API is still more statically checked than ESQL. You can see how the query could easily be extended to restrict on much more complex conditions, even including fields on joined tables.

[1] As I wrote, We’re using Code-first, which is much more comfortable than using the database-diagram editor of old. We’re also using the nascent “Migrations” support, which has so far worked OK, though it’s nowhere near as convenient as Quino’s automated schema-migration.

[2] Though it is inefficient, it’s better than a lot of other examples out there, which almost unilaterally include the call to context.SaveChanges() inside the foreach-loop. Doing so is wasteful and does not give EF an opportunity to optimize the delete calls into a single SQL statement (see footnote below).

[3] Translates to: “Hope is the last (thing) to die.”

[4]

With the following caveats, which generally apply to all queries with any ORM:

Many databases use a different syntax and provide different support for DELETE vs. SELECT operations.
Therefore, it is more likely that more complex conditions are not supported for DELETE operations on some database back-ends
Since the syntax often differs, it’s more likely that a more complex query will fail to map properly in a DELETE operation than in a SELECT operation simply because that particular combination has never come up before.
That said, Quino has quite good support for deleting objects with restrictions not only on the table from which to delete data but also from other, joined tables.

Some combination of these reasons possibly accounts for EF’s lack of support for batch deletes.

Merge conflicts in source control

2013-05-05T21:36:02+02:00

Published by marco on 5. May 2013 21:36:02 (GMT-5)

I was recently asked a question about merge conflicts in source-control systems.

“[…] there keep being issues of files being over written, changes backed out etc. from people coding in the same file from different teams.”

My response was as follows:

tl;dr: The way to prevent this is to keep people who have no idea what they’re doing from merging files.

Extended version

Let’s talk about bad merges happening accidentally. Any source-control worth its salt will support at least some form of automatic merging.

An automatic merge is generally not a problem because the system will not automatically merge when there are conflicts (i.e. simultaneous edits of the same lines, or edits that are “close” to one another in the base file).

An automatic merge can, however, introduce semantic issues.

For example if both sides declared a method with the same name, but in different places in the same file, an automatic merge will include both copies but the resulting file won’t compile (because the same method was declared twice).

Or, another example is as follows:

Base file

public void A(B b)
{
  var a = new A();

  b.Do(a);
  b.Do(a);
  b.Do(a);
}

Team One version

public void A(B b)
{
  var a = new A();

  b.Do(a);
  b.Do(a);
  b.Do(a);
  a.Do();
}

Team Two version

public void A(B b)
{
  var a = null;

  b.Do(a);
  b.Do(a);
  b.Do(a);
}

Automatic merge

public void A(B b)
{
  var a = null;

  b.Do(a);
  b.Do(a);
  b.Do(a);
  a.Do();
}

The automatically merged result will compile, but it will crash at run-time. Some tools (like ReSharper) will display a warning when the merged file is opened, showing that a method is being called on a provably null variable. However, if the file is never opened or the warning ignored or overlooked, the program will crash when run.

In my experience, though, this kind of automatic-merge “error” doesn’t happen very often. Code-organization techniques like putting each type in its own file and keeping methods bodies relatively compact go a long way toward preventing such conflicts. They help to drastically reduce the likelihood that two developers will be working in the same area in a file.

With these relatively rare automatic-merge errors taken care of, let’s move on to errors introduced deliberately through maliciousness or stupidity. This kind of error is also very rare, in my experience, but I work with very good people.

“Let’s say we have two teams:
Team One − branch one
> Works on file 1
Team Two − branch two
> Works on file 1
Team One promotes file 1 into the Master B branch, there are some conflicts that they are working out but the file is promoted.”

I originally answered that I wasn’t sure what it meant to “promote” a file while still working on it. How can a file be commited or checked in without having resolved all of the conflicts?

As it turns out, it can’t. As documented in TFS Server 2012 and Promoting changes (Stack Overflow), promotion simply means telling TFS to pick up local changes and add them to the list of “Pending Changes”. This is part of a new TFS2012 feature called “Local Workspaces”. A promoted change corresponds to having added a file to a change list in Perforce or having staged a file in Git.

The net effect, though, is that the change is purely local. That is has been promoted has nothing to do with merging or committing to the shared repository. Other users cannot see your promoted changes. When you pull down new changes from the server, conflicts with local “promoted” changes will be indicated as usual, even if TFS has already indicated conflicts between a previous change and another promoted, uncommitted version of the same file. Any other behavior else would be madness. [1]

“Team Two checks in their file 1 into the Master B branch. They back out the changes that Team One made without telling anyone anything.”

There’s your problem. This should never happen unless Team Two has truly determined that their changes have replaced all of the work that Team One did or otherwise made it obsolete. If people don’t know how to deal with merges, then they should not be merging.

Just as Stevie Wonder’s not allowed behind the wheel of a car, neither should some developers be allowed to deal with merge conflicts. In my opinion, though, any developer who can’t deal with merges in code that he or she is working on should be moved another team or, possibly, job. You have to know your own code and you have to know your tools. [2]

“Team One figures out the conflicts in their branch and re-promotes file one (and other files) to Master B branch. The source control system remembers that file 1 was backed out by Team Two so it doesn’t promote file 1 but doesn’t let the user know.”

This sounds insane. When a file is promoted—i.e. added to the pending changes—it is assumed that the current version is added to the pending changes, akin to staging a file in Git. When further changes are made to the file locally, the source-control system should indicate that it has changed since having been promoted (i.e. staged).

When you re-promote the file (re-stage it), TFS should treat that as the most recent version in your workspace. When you pull down the changes from Team 2, you will have all-new conflicts to resolve because your newly promoted file will still be in conflict with the changes they made to “file 1”—namely that they threw away all of the changes that you’d made previously.

And, I’m not sure how it works in TFS, but in Git, you can’t “back out” a commit without leaving a trail:

Either there is a merge commit where you can see that Team Two chose to “accept their version” rather than “merge” or “accept other version”
Or, there is a “revert” commit that “undoes” the changes from a previous commit

Either way, your local changes will cause a conflict because they will have altered the same file in the same place as either the “merge” or “revert” commit and—this is important—will have done so after that other commit.

To recap, let me summarize what this sounds like:

T1: I want to check in file1
TFS: You have conflicts
T1: Promote file1 so that TFS knows about (other users can’t see it yet because it hasn’t been committed)
TFS: Okie dokie
T2: I want to check in file1
TFS: You have conflicts
T2: Fuck that. Use my version. Oh, and, fuck T1.
TFS: I hear and obey. T2/file1 it is.
T1: OK, I resolved conflicts; here’s the final version of file1
TFS: Thanks! *tosses T1/file1 out the window*

I don’t believe that this is really possible—even with TFS—but, if this is a possibility with your source-control, then you have two problems:

You have team members who don’t know how to merge
Your source control is helping them torpedo development

There is probably a setting in your source-control system that disallows simultaneous editing for files. This is a pretty huge restriction, but if your developers either can’t or won’t play nice, you probably have no choice.

[1] This is not to rule out such behavior 100%, especially in a source-control system with which I am largely unfamiliar. It only serves to indicate the degree to which I would be unwilling to work with any system that exhibits this kind of behavior.

[2] Different companies can have different grace periods for learning these two things, of course. I suppose that grace period can be interminably long, but…

A provably safe parallel language extension for C#

2013-02-12T21:44:37+01:00

Published by marco on 12. Feb 2013 21:44:37 (GMT-5)

Updated by marco on 12. Apr 2013 10:01:17 (GMT-5)

The paper Uniqueness and Reference Immutability for Safe Parallelism by Colin S. Gordon, Matthew J. Parkinson, Jared Parsons, Aleks Bromfield, Joe Duffy (Microsoft Research) is quite long (26 pages), detailed and involved. To be frank, most of the notation was foreign to me—to say nothing of making heads or tails of most of the proofs and lemmas—but I found the higher-level discussions and conclusions quite interesting.

The abstract is concise and describes the project very well:

“A key challenge for concurrent programming is that side-effects (memory operations) in one thread can affect the behavior of another thread. In this paper, we present a type system to restrict the updates to memory to prevent these unintended side-effects. We provide a novel combination of immutable and unique (isolated) types that ensures safe parallelism (race freedom and deterministic execution). The type system includes support for polymorphism over type qualifiers, and can easily create cycles of immutable objects. Key to the system’s flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking. Our type system models a prototype extension to C# that is in active use by a Microsoft team. We describe their experiences building large systems with this extension. We prove the soundness of the type system by an embedding into a program logic.”

The project proposes a type-system extension with which developers can write provably safe parallel programs—i.e. “race freedom and deterministic execution”—with the amount of actual parallelism determined when the program is analyzed and compiled rather than decided by a programmer creating threads of execution.

Isolating objects for parallelism

The “isolation” part of this type system reminds me a bit of the way that SCOOP addresses concurrency. That system also allows programs to designate objects as “separate” from other objects while also releasing the program from the onus of actually creating and managing separate execution contexts. That is, the syntax of the language allows a program to be written in a provably correct way (at least as far as parallelism is concerned; see the “other provable-language projects” section below). In order to execute such a program, the runtime loads not just the program but also another file that specifies the available virtual processors (commonly mapped to threads). Sections of code marked as “separate” can be run in parallel, depending on the available number of virtual processors. Otherwise, the program runs serially.

In SCOOP, methods are used as a natural isolation barrier, with input parameters marked as “separate”. See SCOOP: Concurrency for Eiffel (Eiffel.com) and SCOOP (software) (Wikipedia) for more details. The paper also contains an entire section listing other projects—many implemented on the the JVM—that have attempted to make provably safe programming languages.

The system described in this paper goes much further, adding immutability as well as isolation (the same concept as “separate” in SCOOP). An interesting extension to the type system is that isolated object trees are free to have references to immutable objects (since those can’t negatively impact parallelism). This allows for globally shared immutable state and reduces argument-passing significantly. Additionally, there are readable and writable references: the former can only be read but may be modified by other objects (otherwise it would be immutable); the latter can be read and written and is equivalent to a “normal” object in C# today. In fact, “[…] writable is the default annotation, so any single-threaded C# that does not access global state also compiles with the prototype.”

Permission types

In this safe-parallel extension, a standard type system is extended so that every type can be assigned such a permission and there is “support for polymorphism over type qualifiers”, which means that the extended type system includes the permission in the type, so that, given B => A, a reference to readable B can be passed to a method that expects an immutable A. In addition, covariance is also supported for generic parameter types.

When they say that the “[k]ey to the system’s flexibility is the ability to recover immutable or externally unique references after violating uniqueness without any explicit alias tracking”, they mean that the type system allows programs to specify sections that accept isolated references as input, lets them convert to writable references and then convert back to isolated objects—all without losing provably safe parallelism. This is quite a feat since it allows programs to benefit from isolation, immutability and provably safe parallelism without significantly changing common programming practice. In essence, it suffices to decorate variables and method parameters with these permission extensions to modify the types and let the compiler guide you as to further changes that need to be made. That is, an input parameter for a method will be marked as immutable so that it won’t be changed and subsequent misuse has to be corrected.

Even better, they found that, in practice, it is possible to use extension methods to allow parallel and standard implementations of collections (lists, maps, etc.) to share most code.

“A fully polymorphic version of a map() method for a collection can coexist with a parallelized version pmap() specialized for immutable or readable collections. […] Note that the parallelized version can still be used with writable collections through subtyping and framing as long as the mapped operation is pure; no duplication or creation of an additional collection just for concurrency is needed.”

Real projects and performance impact

Much of the paper is naturally concerned with proving that their type system actually does what it says it does. As mentioned above, at least 2/3 of the paper is devoted to lemmas and large swaths of notation. For programmers, the more interesting part is the penultimate section that discusses the extension to C# and the experiences in using it for larger projects.

“A source-level variant of this system, as an extension to C#, is in use by a large project at Microsoft, as their primary programming language. The group has written several million lines of code, including: core libraries (including collections with polymorphism over element permissions and data-parallel operations when safe), a webserver, a high level optimizing compiler, and an MPEG decoder.”

Several million lines of code is, well, it’s an enormous amount of code. I’m not sure how many programmers they have or how they’re counting lines or how efficiently they write their code, but millions of lines of code suggests generated code of some kind. Still, taken with the next statement on performance, that much code more than proves that the type system is viable.

“These and other applications written in the source language are performance-competitive with established implementations on standard benchmarks; we mention this not because our language design is focused on performance, but merely to point out that heavy use of reference immutability, including removing mutable static/global state, has not come at the cost of performance in the experience of the Microsoft team.”

Not only is performance not impacted, but the nature of the typing extensions allows the compiler to know much more about which values and collections can be changed, which affects how aggressively this data can be cached or inlined.

“In fact, the prototype compiler exploits reference immutability information for a number of otherwise-unavailable compiler optimizations. […] Reference immutability enables some new optimizations in the compiler and runtime system. For example, the concurrent GC can use weaker read barriers for immutable data. The compiler can perform more code motion and caching, and an MSIL-to-native pass can freeze immutable data into the binary.”

Incremental integration (“unstrict” blocks)

In the current implementation, there is an unstrict block that allows the team at Microsoft to temporarily turn off the new type system and to ignore safety checks. This is a pragmatic approach which allows the software to be run before it has been proven 100% parallel-safe. This is still better than having no provably safe blocks at all. Their goal is naturally to remove as many of these blocks as possible—and, in fact, this requirement drives further refinement of the type system and library.

“We continue to work on driving the number of unstrict blocks as low as possible without over-complicating the type system’s use or implementation.”

The project is still a work-in-progress but has seen quite a few iterations, which is promising. The paper was written in 2012; it would be very interesting to take it for a test drive in a CTP.

Other provable-language projects

A related project at Microsoft Research Spec# contributed a lot of basic knowledge about provable programs. The authors even state that the “[…] type system grew naturally from a series of efforts at safe parallelism. […] The earliest version was simply copying Spec#’s [Pure] method attribute, along with a set of carefully designed task-and data-parallelism libraries.” Spec#, in turn, is a “[…] formal language for API contracts (influenced by JML, AsmL, and Eiffel), which extends C# with constructs for non-null types, preconditions, postconditions, and object invariants”.

Though the implementation of this permissions-based type system may have started with Spec#, the primary focus of that project was more a valiant attempt to bring Design-by-Contract principles (examples and some discussion here (encodo.com)) to the .NET world via C#. Though spec# has downloadable code (CodePlex), the project hasn’t really been updated in years. This is a shame, as support for Eiffel [1] in .NET, mentioned above as one of the key influences of spec#, was dropped by ISE Eiffel long ago.

Spec#, in turn, was mostly replaced by Microsoft Research’s Contracts project (an older version of which was covered in depth in Microsoft Code Contracts: Not with a Ten-foot Pole (earthli.com)). The Contracts project seems to be alive and well: the most recent release is from October, 2012. I have not checked it out since my initial thumbs-down review (linked above) but did note in passing that the implementation is still (A) library-only and (B) does not support Visual Studio 2012.

The library-only restriction is particularly galling, as such an implementation can lead to repeated code and unwieldy anti-patterns. As documented in the Contracts FAQ, the current implementation of the “tools take care of enforcing consistency and proper inheritance of contracts” but this is presumably accomplished with compiler errors that require the programmer to include contracts from base methods in overrides.

The seminal work Object-oriented Software Construction by Bertrand Meyer (vol. II in particular) goes into tremendous detail on a type system that incorporates contracts directly. The type system discussed in this article covers only parallel safety: null-safety and other contracts are not covered at all. If you’re at all interested in these types of language extensions, the vol.2 of OOSC is a great read. The examples are all in Eiffel but should be relatively accessible. Though some features—generics, notably but also tuples, once routines and agents (earthli.com)—have since made their way into C# and other more commonly used languages, many others—such as contracts, anchored types (contravariance is far too constrained in C# to allow them), covariant return types, covariance everywhere, multiple inheritance, explicit feature removal, loop variants and invariants, etc.—are still not available. Subsequent interesting work has also been done on extensions that allow creation of provably null-safe programs (earthli.com), something also addressed in part by Microsoft Research’s Contracts project.

Programming in the moderncurrent age

2013-02-03T23:04:09+01:00

Published by marco on 3. Feb 2013 23:04:09 (GMT-5)

In order to program in 2013, it is important not to waste any time honing your skills with outdated tools and work-flows. What are the essential pieces of software for developing software in 2013?

Runtime: A runtime is a given for all but the most esoteric of programming exercises. Without something to execute your code, there is almost no point in writing it.
Debugger: Programming without an integrated debugger can be very time-consuming, error-prone and will quite frankly suck the fun right out of the whole endeavor. And, by “debugger” I mean a source-level single-step debugger with call-stack and variable/object/structure inspection as well as expression evaluation. Poring through logs and inserting print statements is not a viable long-term or even medium-term solution. You shouldn’t be writing in a language without one of these unless you absolutely can’t avoid it (NAnt build scripts come to mind).
Compiler/Checker: A syntax/semantics checker of some sort integrated into the editor ensures a tighter feedback/error-finding loop and saves time, energy and frustration. I was deliberately cagey with the “checker” because I understand that some languages, like Javascript [1], do not have a compiled form. Duck-typed languages like Python or Ruby also limit static checking but anything is better than nothing.
Versioning: A source-control system is essential in order to track changes, test ideas and manage releases. A lot of time can be wasted—and effort lost—without good source control. Great source control decreases timidity, encourages experimentation and allows for interruptible work-flows. I will argue below that private branches and history rewriting are also essential.

Even for the smallest projects, there is no reason to forgo any of these tools.

Managing your Source Code

tl;dr: It’s 2013 and your local commit history is not sacrosanct. No one wants to see how you arrived at the solution; they just want to see clean commits that explain your solution as clearly as possible. Use git; use rebase; use “rebase interactive”; use the index; stage hunks; squash merge; go nuts. [2]

Encodo's Branching ModelI would like to focus on the “versioning” part of the tool-chain. Source control tells the story of your code, showing how it evolved to where it is at any given point. If you look closely at the “Encodo Branching Model” [3] diagram (click to enlarge), you can see the story of the source code:

All development was done in the master branch until v1.0 was released
Work on B was started in a feature branch
Work on hotfix v1.0.1 was started in a hotfix branch
Work on A was started in another feature branch
Hotfix v1.0.1 was released, tagged and merged back to the master branch
Development continued on master and both feature branches
Master was merged to feature branch A (includes hotfix v1.0.1 commits)
Finalization for release v1.1 was started in a release branch
Feature A was completed and merged back to the master branch
Version v1.1 was released, tagged and merged back to the master branch
Master was merged to feature branch B (includes v1.1 and feature A commits)
Development continued on master and feature B
Version v1.2 was released and tagged

Small, precise, well-documented commits are essential in order for others to understand the project—especially those who weren’t involved in developing the code. It should be obvious from which commits you made a release. You should be able to go back to any commit and easily start working from there. You should be able to maintain multiple lines of development, both for maintenance of published versions and for development of new features. The difficulty of merging these branches should be determined by the logical distance between them rather than by the tools. Merging should almost always be automatic.

Nowhere in those requirements does it say that you’re not allowed to lie about how you got to that pristine tree of commits.

Why you should be using private branches and history rewriting

A few good articles about Git have recently appeared—Understanding the Git Workflow by Benjamin Sandofsky is one such—explaining better than ever why rewriting history is better than server-side, immutable commits.

In the article cited above, Sandofsky divides his work up into “Short-lived work […] larger work […] and branch bankrupty.” These concepts are documented to some degree in the Branch Management chapter of the Encodo Git Handbook (of which I am co-author). I will expand on these themes below.

Note: The linked articles deal exclusively with the command line, which isn’t everyone’s favorite user interface (I, for one, like it). We use the SmartGit/Hg client for visualizing diffs, organizing commits and browsing the log. We also use the command-line for a lot of operations, but SmartGit is a very nice tool and version 3 supports nearly all of the operations described in this article.

What is rebasing?

As you can see from the diagram above, a well-organized and active project will have multiple branches. Merging and rebasing are two different ways of getting commits from one branch into another.

Merging commits into a branch creates a merge commit, which shows up in the history to indicate that n commits were made on a separate branch. Rebasing those commits instead re-applies those commits to the head of the indicated branch without a merge commit. In both cases there can be conflicts, but one method doesn’t pose a greatest risk of them than the other. [4] You cannot tell from the history that rebased commits were developed in a separate branch. You can, however, tell that the commits were rebased because the author date (the time the commit was originally created) differs from the commit date (the last time that the commit was applied).

What do you recommend?

At Encodo, we primarily work in the master branch because we generally work on very manageable, bite-sized issues that can easily be managed in a day. Developers are free to use local branches but are not required to do so. If some other requirement demands priority, we shunt the pending issue into a private branch. Such single-issue branches are focused and involve only a handful of files. It is not at all important to “remember” that the issue was developed in a branch rather than the master branch. If there are several commits, it may be important for other users to know that they were developed together and a merge-commit can be used to indicate this. Naturally, larger changes are developed in feature branches, but those are generally the exception rather than the rule.

Remember: Nowhere in those requirements does it say that you’re not allowed to lie about how you got to that pristine tree of commits.

Otherwise? Local commit history is absolutely not sacrosanct. We rebase like crazy to avoid unwanted merge commits. That is, when we pull from the central repository, we rebase our local commits on top of the commits that come form the origin. This has worked well for us.

If the local commit history is confusing—and this will sometimes come up during the code review—we use an interactive rebase to reorganize the files into a more soothing and/or understandable set of commits. See Sandofsky’s article for a good introduction to using interactive rebasing to combine and edit commits.

Naturally, we weigh the amount of confusion caused by the offending commits against the amount of effort required to clean up the history. We don’t use bisect [5] very often, so we don’t invest a lot of time in enforcing the clean, compilable commits required by that tool. For us, the history is interesting, but we rarely go back farther than a few weeks in the log. [6]

When to merge? When to rebase?

At Encodo, there are only a few reasons to retain a merge commit in the official history:

If we want to remember which commits belonged to a particular feature. Any reasonable tool will show these commits graphically as a separate strand running alongside the master branch.
If a rebase involves too much effort or is too error-prone. If there are a lot of commits in the branch to be integrated, there may be subtle conflicts that resolve more easily if you merge rather than rebase. Sometimes we just pull the e-brake and do a merge rather than waste time and effort trying to get a clean rebase. This is not to say that the tools are lacking or at fault but that we are pragmatic rather than ideological. [7]
If there are merge commits in a feature branch with a large number of well-organized commits and/or a large number of changes or affected files. In this case, using a squash merge and rebuilding the commit history would be onerous and error-prone, so we just merge to avoid issues that can arise when rebasing merge commits (related to the point above).

When should I use private branches? What are they exactly?

There are no rules for local branches: you can name them whatever you like. However, if you promote a local branch to a private branch, at Encodo we use the developer’s initials as the prefix for the branch. My branches are marked as “mvb/feature1”, for example.

What’s the difference between the two? Private branches may get pushed to our common repository. Why would you need to do that? Well, I, for example, have a desktop at work and, if I want to work at home, I have to transfer my workspace somehow to the machine at home. One solution is to work on a virtual machine that’s accessible to both places; another is to remote in to the desktop at work from home; the final one is to just push that work to the central repository and pull it from home. The offline solution has the advantage of speed and less reliance on connectivity.

What often happens to me is that I start work on a feature but can only spend an hour or two on it before I get pulled off onto something else. I push the private branch, work on it a bit more at home, push back, work on another, higher-priority feature branch, merge that in to master, work on master, whatever. A few weeks later and I’ve got a private branch with a few ugly commits, some useful changes and a handful of merge commits from the master branch. The commit history is a disgusting mess and I have a sneaking suspicion that I’ve only made changes to about a dozen files but have a dozen commits for those changes.

That’s where the aforementioned “branch bankruptcy” comes in. You’re not obligated to keep that branch; you can keep the changes, though. As shown in the referenced article, you execute the following git commands:

git checkout master
git checkout -b cleaned_up_branch
git merge --squash private_feature_branch
git reset

The --squash tells git to squash all of the changes from the private_feature_branch into the index (staging) and reset the index so that those changes are in the working tree. From here, you can make a single, clean, well-written commit or several commits that correspond logically to the various changes you made.

Git also lets you lose your attachment to checking in all the changes in a file at once: if a file has changes that correspond to different commits, you can add only selected differences in a file to the index (staging). In praise of Git’s index by Aristotle Pagaltzis (Plasmasturm) provides a great introduction. If you, like me, regularly take advantage of refactoring and cleanup tools while working on something else, you’ll appreciate the ability to avoid checking in dozens of no-brainer cleanup/refactoring changes along with a one-liner bug-fix. [8]

One final example: cherry picking and squashing

I recently renamed several projects in our solution, which involved renaming the folders as well as the project files and all references to those files and folders. Git automatically recognizes these kind of renames as long as the old file is removed and the new file is added in the same commit.

I selected all of the files for the rename in SmartGit and committed them, using the index editor to stage only the hunks from the project files that corresponded to the rename. Nice and neat. I selected a few other files and committed those as a separate bug-fix. Two seconds later, the UI refreshed and showed me a large number of deleted files that I should have included in the first commit. Now, one way to go about fixing this is to revert the two commits and start all over, picking the changes apart (including playing with the index editor to stage individual hunks).

Instead of doing that, I did the following:

I committed the deleted files with the commit message “doh!” (to avoid losing these changes in the reset in step 3)
I created a “temp” branch to mark that commit (to keep the commit visible once I reset in step 3)
I hard-reset my master branch to the origin
I cherry-picked the partial-rename commit to the workspace
I cherry-picked the “doh!” commit to the workspace
Now the workspace had the rename commit I’d wanted in the first place
I committed that with the original commit message
I cherry-picked and committed the separate bug-fix commit
I deleted the “temp” branch (releasing the incorrect commits on it to be garbage-collected at some point)

Now my master branch was ready to push to the server, all neat and tidy. And nobody was the wiser.

[1] There are alternatives now, though, like Microsoft’s TypeScript, that warrant a look if only because they help tighten the error-finding feedback loop and have the potential to make you more efficient (the efficiency may be robbed immediately back, however, if debugging generated code becomes difficult or even nightmarish).

[2] Once you’ve pushed, though? No touchie. At that point, you’ve handed in your test and you get graded on that.

[3] According to my business card, I’m a “senior developer and partner” at Encodo System AG.

[4] With the exception, mentioned elsewhere as well, that rebasing merge-commits can sometimes require you to re-resolve previously resolved conflicts, which can be error-prone if the conflicts were difficult to resolve in the first place. Merging merge-commits avoids this problem.

[5] bisect is a git feature that executes a command against various commits to try to localize the commit that caused a build or test failure. Basically, you tell it the last commit that worked and git uses a binary search to find the offending commit. Of course, if you have commits that don’t compile, this won’t work very well. We haven’t used this feature very much because we know the code in our repositories well and using blame and log is much faster. Bisect is much more useful for maintainers that don’t know the code very well, but still need to figure out at which commit it stopped working.

[6] To be clear: we’re only so cavalier with our private repositories to which access is restricted to those who already know what’s going on. If we commit changes to public, open-source or customer repositories, we make sure that every commit compiles. See Aristotle’s index article (cited above) for tips on building and testing against staged files to ensure that a project compiles, runs and passes all tests before making a commit—even if you’re not committing all extant changes.

[7] That said, with experience we’ve learned that an interactive rebase and judicious squashing will create commits that avoid these problems. With practice, these situations crop up more and more rarely.

[8] Of course, you can also create a separate branch for your refactoring and merge it all back together, but that’s more work and is in my experience rarely necessary.

Are all errors exceptions?

2012-11-22T23:24:02+01:00

Published by marco on 22. Nov 2012 23:24:02 (GMT-5)

The following ruminations were written seven years ago but have held up remarkably well. They have been published with minor updates.

This article deals with the situation illustrated below, specifically the question raised in the comment.

if (! $folder_id)
{
  $this->db->logged_query (“SELECT folder_id FROM” .
                           $this->app->table_names->objects .
                           “WHERE id = $obj->object_id”);
  if ($this->db->next_record ())
    $folder_id = $this->db->f (“folder_id”);
  else
    // raise exception? ignore? what to do?
}

Above we see a situation in which you may decide against a stricter enforcement because whereas the error is clear, the reaction is not. This is also a big part of working with contracts: deferring reactions. Often—especially when developing libraries—you’re in code so deep that the desired reaction could be one of many depending on the deployment of that code.

The code above is taken from the publishing loop in the webcore; it’s used to publish comments. In effect, the code has detected that a comment object id has been passed in that doesn’t correspond to anything in the system. It’s bogus. It’s wrong.

Some deployments—I would hazard most—would just like to silently ignore the error and publish as much as possible. Silently ignoring an error will always bite you in the ass in the end (pun intended). The key here is that whereas the person deploying the final system should be perfectly free to ignore the error, you, as the library developer, can and must not.

Let’s see what kind of reactions we could have here. Well, isn’t that what exceptions were invented for? They’re for transmitting error conditions up out of deep library code. Problem solved. For more severe errors in which the code cannot continue, the answer is quite clear: you simply throw an exception. However, in the situation above, it’s not so clear.

The problem is easily skipped and most of the rest of the job can be finished. Here is where deferral comes in. Just call a function that will handle it later. This function can log the error or warning, display it to the user, ask to abort/retry/ignore, consult a table for same, throw an exception or just ignore it. It’s not your problem to dispatch solutions to encountered errors. It’s your job to detect them and maintain the integrity of the running code.

Simply throwing an exception no matter what the error condition is, in effect, making a decision about how the error will be handled. Control is lost because the exception handler is necessarily higher up. This is a bad thing if you’d actually like the code to do the best it can. As any experience at all will have shown you, some errors are just warnings or hints. It’s not just black and white, error or not. Many deployments of the system containing the code above will actually treat the issue as a warning and log it for the database techs to address.

However, to assume the opposite, that callers want errors to be swallowed, cheats those callers as well. It cheats them because it becomes incredibly hard to find errors; they must be detected by subtle logic or data problems (e.g. Hmmm…the log shows it only sent 500 emails, I thought there were 503 subscribers…). If the system never complains or logs anything, the end user calls you first. It cheats you because you can never adequately test your system because it never complains. Everything’s OK. It kind of works. It mostly works.

The desire for safety or avoiding crashes or exceptions on the client side should never override the desire to have correct code that detects error conditions. If you write library code or end-user code, that code deals mostly with detecting and reporting misused functions. The functionality itself is generally straightforward; it’s wrapping the interface around it that’s hard. The only thing that you’ll probably spend more time on is hunting down memory bugs—and, yes, even if you’re using a garbage-collected runtime, you can still have memory bugs. What would you call it if the memory required by your publication script increases in proportion to the number of subscribers and mails?

If you’re wondering what I ended up doing in the case above, I decided on a function call ‘raise’. It sounds like an exception, and that’s usually what happens: it breaks the code on that line, but a little more elegantly than the usual PHP die statement. While the default handler for exceptions simply issues a fancy die statement, that handler can be replaced with a different one, one that redirects to an HTML page with a nicely formatted error printout and a form for submitting the error.

Since this code is likely to run inside a script that just wants to send subscriptions and doesn’t care about data integrity errors, the handler would probably be replaced with something that suppresses the exception, but logs the error. That way, once the subscription run is done, you can view the error log and see if there are data integrity problems, and, perhaps more importantly, you see them all at once instead of just one at a time. And, more importantly still, those subscribers for whom there were no problems received their mail on time.

A scalable pattern for building metadata

2012-11-22T19:45:47+01:00

Published by marco on 22. Nov 2012 19:45:47 (GMT-5)

The following article was originally published on the Encodo blogs and is cross-published here.

In the latest version of Quino—version 1.8.5—we took a long, hard look at the patterns we were using to create metadata. The metadata for an application includes all of the usual Quino stuff: classes, properties, paths, relations. With each version, though we’re able to use the metadata in more places. That means that the metadata definition code grows and grows. We needed some way to keep a decent overview of that metadata without causing too much pain when defining it.

In order to provide some background, the following are the high-level requirements that we kept in mind while designing the new pattern and supporting framework.

Manage complexity: A simple model should be easy and straightforward to build, with no cumbersome boilerplate; complex models should support multiple layers and provide an overview
Leverage existing knowhow: Our users don’t want to learn a new language/IDE in order to create metadata; neither do we want to provide support for our own metadata-definition language
Support modularization: Modules can be used to hide complexity but are also sometimes necessary to define hard boundaries in the application metadata
Support extensibility: Interdependent modules and overlays will need to refer to elements in other modules; there needs to be a standard mechanism for defining and accessing metadata elements that doesn’t rely on string constants [1]
Support refactoring: Rely on convention and name-matching as little as possible to avoid subtle errors
Support introspection: Developers that stick to the pattern should be able to maximize efficiency using common navigation and introspection [2] tools like Visual Studio or ReSharper.

Definition Language

Quino metadata has always been defined using a .NET language—in our case, we always use C# to define the metadata, using the MetaBuilder or InMemoryMetaBuilder to compose the application model. This approach satisfies the need to leverage existing tools, refactoring and introspection.

Since Quino metadata is an in-memory construct, there will always be a .NET API for creating metadata. This is not to say that there will never be a DSL to define Quino metadata but that such an approach is not the subject of this post.

Modularization

Quino applications have always been able to define and integrate metadata modules (e.g. reporting or security) using an IMetaModuleBuilder. Modules solved interdependency issues by splitting the metadata-generation into several phases:

Add classes and foreign keys
Add paths between classes (depends on foreign keys)
Add calculated properties and relations (depends on paths)
Add layouts (depends on all properties)

In this way, when a module needed to add a path between a class that it had defined and a class defined in another module, it could be guaranteed that classes and foreign keys for all modules had been defined before any paths were created. Likewise for classes that wanted to define relations based on paths defined in other modules.

The limitation of the previous implementation was that a module generator always created its own module and builder and could not simply re-use those created by another generator. Basically, there was no “lightweight” way of splitting metadata-generation into separate files for purely organizational purposes.

There were also a few issues with the implementation of the main model-generation code as well. The previous pattern depended heavily on local variables, all defined within one mammoth function. Separating code into individual method calls was ad-hoc—each project did it a little differently—and involved a lot of migration of local variables to instance variables. With all code in a single method, file-structure navigation tools couldn’t help at all. The previous pattern prescribed using file comments or regions that could be located using “find in file”. This was clearly sub-optimal.

The new pattern

The new pattern that can be applied for all models, bit or small includes the following parts:

Model generator: As before, there is a class that implements the IMetaModelGenerator interface. This class is used by the application configuration and various tools (e.g. the code generator or UML generator) to create the model.
Model elements: Metadata that is referenced from multiple steps in the metadata-generation process is stored in a separate object (or objects) called the model elements. (E.g. classes are created in the AddClasses() step and referenced in the AddPaths, AddProperties and AddLayouts steps.) The model elements typically has two properties called Classes and Paths.
Metadata generators: Module generators still exist, but there are now also metadata generators that are lightweight, using a metadata builder and elements defined by another generator (typically a module generator or the model generator itself).

This may sound like a lot of overhead for a simple application, but it’s really not that much extra code. The benefits are:

Models, modules and lightweight parts all use the same pattern, with the same phases and method names. That makes it far easier to know where to look for a definition
Since the pattern is the same, it’s easy to move functionality from one module to another or to split one module into multiple lightweight parts without doing a lot of refactoring
A small model will naturally grow to a medium or large model, all while using the same pattern. There is no moment during development where you have to do a major refactoring in order to get organized: the pattern will naturally support a clean coding style.

Building a model, step by step

But enough chatter; let’s take a look at the absolute minimum boilerplate for an empty model.

Step zero: create the boilerplate

public class DemoModelElements
{
  public DemoModelElements()
  {
    Classes = new DemoModelClasses();
    Paths = new DemoModelPaths();
  }

  public DemoModelClasses Classes { get; private set; }
  public DemoModelPaths Paths { get; private set; }
}

public class DemoModelPaths
{
}

public class DemoModelClasses
{
}

public class DemoCoreGenerator : DependentMetadataGeneratorBase
{
}

public class DemoModelGenerator : MetaBuilderBasedModelGeneratorBase
{
  protected override void AddMetadata()
  {
    Builder.Include();
  }
}

The code above is functional but doesn’t actually create any metadata. So what does it do?

It uses the generic MetaBuilderBasedModelGeneratorBase to indicate the type of Elements that will be exposed by this model generator. The elements class is created automatically and is available as the property Elements (as we’ll see in the examples below). Additionally, we’re using a ModelGeneratorBase that is based on a MetaBuilder which means that the property Builder is also available and is of type MetaBuilder.
It includes the DemoCoreGenerator which is a dependent generator—it’s lightweight and uses the elements and builder from its owner. The exact types are shown in the class declaration; it can be read as: get elements of type DemoModelElements and a builder of type MetaBuilder from the generator with type DemoModelGenerator. The initial generic argument can be any other metadata generator that implements the IElementsProvider interface.
The model generator overrides AddMetadata to include the metadata created by DemoCoreGenerator in the model.

Even though it’s not very much code, you can create a snippet or a file template with Visual Studio or a Live Template or file template with ReSharper to quickly create a new model.

Step one: define the model

Now, let’s fill the empty model with some metadata. The first step is to define the model that we’re going to build. That part goes in the AddMetadata() method. [3]

public class DemoModelGenerator : MetaBuilderBasedModelGeneratorBase
{
  protected override void AddMetadata()
  {
    Builder.CreateModel("Demo", /*Guid*/);
    Builder.CreateMainModule("Encodo.Quino");

    Builder.Include();
  }
}

Step two: add a class

A typical next step is to define a class. Let’s do that.

public class DemoModelClasses
{
  public IMetaClass Company { get; set; }
}

public class DemoCoreGenerator : DependentMetadataGeneratorBase
{
  protected override void AddClasses()
  {
    Elements.Classes.Company = Builder.AddClassWithDefaultPrimaryKey("Company", /*Guid*/, /*Guid*/);
  }
}

As you can see, we added a new class to the elements and created and assigned it in the AddClasses() phase of metadata-generation.

Step three: add another class and a path

An obvious next step is to create another class and define a path between them.

public class DemoModelClasses
{
  public IMetaClass Company { get; set; }
  public IMetaClass Person { get; set; }
}

public class DemoCoreGenerator : DependentMetadataGeneratorBase
{
  protected override void AddClasses()
  {
    Elements.Classes.Company = Builder.AddClassWithDefaultPrimaryKey("Company", /*Guid*/, /*Guid*/);
    Elements.Classes.Person = Builder.AddClassWithDefaultPrimaryKey("Person", /*Guid*/, /*Guid*/);
    Builder.AddInvisibleProperty(Elements.Classes.Person, "CompanyId", MetaType.Key, true, /*Guid*/);
  }

  protected override void AddPaths()
  {
    Elements.Paths.CompanyPersonPath = Builder.AddOneToManyPath(
      Elements.Classes.Company, "Id",
      Elements.Classes.Person, "CompanyId",
      /*Guid*/, /*Guid*/
    );    
  }
}

Step four: add relations

Having a path is not enough, though. We can also define how the relations on that path are exposed in the classes.

public class DemoCoreGenerator : DependentMetadataGeneratorBase
{
  protected override void AddClasses()
  {
    Elements.Classes.Company = Builder.AddClassWithDefaultPrimaryKey("Company", /*Guid*/, /*Guid*/);
    Elements.Classes.Person = Builder.AddClassWithDefaultPrimaryKey("Person", /*Guid*/, /*Guid*/);
    Builder.AddInvisibleProperty(Elements.Classes.Person, "CompanyId", MetaType.Key, true, /*Guid*/);
  }

  protected override void AddPaths()
  {
    Elements.Paths.CompanyPersonPath = Builder.AddOneToManyPath(
      Elements.Classes.Company, "Id",
      Elements.Classes.Person, "CompanyId",
      /*Guid*/, /*Guid*/
    );    
  }

  protected override void AddProperties()
  {
    Builder.AddRelation(Elements.Classes.Company, "People", "", Elements.Paths.CompanyPersonPath);
    Builder.AddRelation(Elements.Classes.Person, "Company", "", Elements.Paths.CompanyPersonPath);
  }
}

OK, now we have a model with two entities—companies and people—that are related to each other so that a company has a list of people and each person belongs to a company.

Step five: add translations

Now we’d like to make the metadata support German as well as English. Quino naturally supports more generalized ways of doing this (e.g. importing from files), but let’s just add the metadata manually to see what that would look like (unaffected methods are left off for brevity).

public class DemoModelElements
{
  public DemoModelElements()
  {
    Classes = new DemoModelClasses();
    Paths = new DemoModelPaths();
  }

  public ILanguage English { get; set; }
  public ILanguage German { get; set; }
  public DemoModelClasses Classes { get; private set; }
  public DemoModelPaths Paths { get; private set; }
}

public class DemoCoreGenerator : DependentMetadataGeneratorBase
{
  protected override void AddCoreElements()
  {
    Elements.English = Builder.AddDisplayLanguage("en-US", "English");
    Elements.German = Builder.AddDisplayLanguage("de-CH", "Deutsch");
  }

  protected override void AddClasses()
  {
    var company = Elements.Classes.Company = Builder.AddClassWithDefaultPrimaryKey("Company", /*Guid*/, /*Guid*/);
    company.Caption.SetValue(Elements.English, "Company");
    company.Caption.SetValue(Elements.German, "Firma");
    company.PluralCaption.SetValue(Elements.English, "Companies");
    company.PluralCaption.SetValue(Elements.German, "Firmen");

    var person = Elements.Classes.Person = Builder.AddClassWithDefaultPrimaryKey("Person", /*Guid*/, /*Guid*/);
    Builder.AddInvisibleProperty(person, "CompanyId", MetaType.Key, true, /*Guid*/);
    person.Caption.SetValue(Elements.English, "Person");
    person.Caption.SetValue(Elements.German, "Person");
    person.PluralCaption.SetValue(Elements.English, "People");
    person.PluralCaption.SetValue(Elements.German, "Personen");
  }
}

Note that I created a local variable for both company and person. I did this for two reasons:

The code is shorter and easier to read
There are fewer references to the Elements.Classes.Person and Elements.Classes.Company properties. It’s useful to keep the number of references to a minimum in order to make searching for usages with a tool like ReSharper of maximum benefit. Otherwise, there’s a lot of noise to signal and you’ll get hundreds of references when there are only actually a few dozen “real” references.

Step six: using private methods

You can see that the metadata-generation code is still manageable, but it’s growing. Once we’ve filled out all of the properties, relations, translations, layouts and view aspects for the person and company classes, we’ll have a file that’s several hundred lines long. A file of that size is still manageable and, since we have methods, it’s eminently navigable with a file-structure browser.

If we don’t mind keeping—or we’d rather keep—everything in one file, we can see more structure by splitting the code into more methods. This is really easy to do because we’re using the elements to reference other parts of metadata instead of local variables. For example, let’s move the class initialization code for the person and company entities to separate methods (unaffected methods are left off for brevity).

public class DemoCoreGenerator : DependentMetadataGeneratorBase
{
  protected override void AddClasses()
  {
    AddCompany();
    AddPerson();
  }

  private void AddCompany()
  {
    var company = Elements.Classes.Company = Builder.AddClassWithDefaultPrimaryKey("Company", /*Guid*/, /*Guid*/);
    company.Caption.SetValue(Elements.English, "Company");
    company.Caption.SetValue(Elements.German, "Firma");
    company.PluralCaption.SetValue(Elements.English, "Companies");
    company.PluralCaption.SetValue(Elements.German, "Firmen");
  }

  private void AddPerson()
  {
    var person = Elements.Classes.Person = Builder.AddClassWithDefaultPrimaryKey("Person", /*Guid*/, /*Guid*/);
    Builder.AddInvisibleProperty(person, "CompanyId", MetaType.Key, true, /*Guid*/);
    person.Caption.SetValue(Elements.English, "Person");
    person.Caption.SetValue(Elements.German, "Person");
    person.PluralCaption.SetValue(Elements.English, "People");
    person.PluralCaption.SetValue(Elements.German, "Personen");
  }
}

Step seven: using multiple generators

While this is a good technique for small models—with anywhere up to five entities—most models are larger and include entities with sizable metadata definitions. Another thing to consider is that, when working with larger teams, it’s often best to keep a central item like the metadata definition as modular as possible.

To scale the pattern up for larger models, we can move code for larger entity definitions into separate generators. As soon as we move an entity to its own generator, we’re faced with the question of where we should create paths for that entity. A path doesn’t really belong to one class or another; in which generate should it go?

Well, we thought about that and came to the conclusion that the pattern should be to just create a separate generator for all paths in the model (or multiple path-only generators if you have a larger model). That is, when a model gets a bit larger, it should include the following generators (using the name “Demo” from the examples above):

DemoCoreGenerator
DemoPathGenerator
DemoCompanyGenerator
DemoPersonGenerator

The DemoCoreGenerator will create metadata and assign elements like the display languages. It’s also recommended to define base types like enumerations and very simple classes [4] in the core as well. Obviously, as the model grows, the core generator may also get larger. This isn’t a problem: just split the contents logically into multiple generators.

For the purposes of this example, though, we only have a single core and a single path generator and two entity generators. Since these generators will all be dependent on the model’s builder and elements, the first step is to define a base class that will be used by the other generators.

internal class DemoDependentGenerator : DependentMetadataGeneratorBase
{
}

public class DemoCoreGenerator : DemoDependentGenerator
{
  protected override void AddCoreElements()
  {
    Elements.English = Builder.AddDisplayLanguage("en-US", "English");
    Elements.German = Builder.AddDisplayLanguage("de-CH", "Deutsch");
  }
}

public class DemoPathGenerator : DemoDependentGenerator
{
  protected override void AddPaths()
  {
    Elements.Paths.CompanyPersonPath = Builder.AddOneToManyPath(
      Elements.Classes.Company, "Id",
      Elements.Classes.Person, "CompanyId",
      /*Guid*/, /*Guid*/
    );
  }
}

public class DemoCompanyGenerator : DemoDependentGenerator
{
  protected override void AddClasses()
  {
    var company = Elements.Classes.Company = Builder.AddClassWithDefaultPrimaryKey("Company", /*Guid*/, /*Guid*/);
    company.Caption.SetValue(Elements.English, "Company");
    company.Caption.SetValue(Elements.German, "Firma");
    company.PluralCaption.SetValue(Elements.English, "Companies");
    company.PluralCaption.SetValue(Elements.German, "Firmen");
  }

  protected override void AddProperties()
  {
    Builder.AddRelation(Elements.Classes.Person, "Company", "", Elements.Paths.CompanyPersonPath);
  }
}

public class DemoPersonGenerator : DemoDependentGenerator
{
  protected override void AddClasses()
  {
    var person = Elements.Classes.Person = Builder.AddClassWithDefaultPrimaryKey("Person", /*Guid*/, /*Guid*/);
    Builder.AddInvisibleProperty(person, "CompanyId", MetaType.Key, true, /*Guid*/);
    person.Caption.SetValue(Elements.English, "Person");
    person.Caption.SetValue(Elements.German, "Person");
    person.PluralCaption.SetValue(Elements.English, "People");
    person.PluralCaption.SetValue(Elements.German, "Personen");
  }

  protected override void AddProperties()
  {
    Builder.AddRelation(Elements.Classes.Company, "People", "", Elements.Paths.CompanyPersonPath);
  }
}

MetaBuilderBasedModelGeneratorBase
{
  protected override void AddMetadata()
  {
    Builder.CreateModel("Demo", /*Guid*/);
    Builder.CreateMainModule("Encodo.Quino");

    Builder.Include();
    Builder.Include();
    Builder.Include();
    Builder.Include();
  }
}

You’ll note that we only moved code around and didn’t have to change any implementation or add any new elements or anything that might introduce subtle errors in the metadata. Please note, the classes are all shown in a single code block above, but the pattern dictates that each class should be in its own file.

Step eight: integrating external modules

So far, we’ve only worked with generators that are dependent on the model generator. How do we access information—and elements—generated in other modules? For example, let’s include the security module and change a translation for a caption.

public class DemoModelElements
{
  public DemoModelElements()
  {
    Classes = new DemoModelClasses();
    Paths = new DemoModelPaths();
  }

  public ILanguage English { get; set; }
  public ILanguage German { get; set; }
  public SecurityModuleElements Security { get; set; }
  public DemoModelClasses Classes { get; private set; }
  public DemoModelPaths Paths { get; private set; }
}

public class DemoCoreGenerator : DemoDependentGenerator
{
  protected override void AddCoreElements()
  {
    Elements.English = Builder.AddDisplayLanguage("en-US", "English");
    Elements.German = Builder.AddDisplayLanguage("de-CH", "Deutsch");
    Elements.Security = Builder.Include().Elements;
  }

  protected override void AddProperties()
  {
    Elements.Security.Classes.User.Caption.SetValue(Elements.German, "Benutzer");
  }
}

This approach works well with any module that has adhered to the pattern and exposes its elements in a standardized way. [5] In this case, the core module includes the security module and retains a reference to its elements. Any code that uses the core module will now have access not only to the core elements but also to the security elements, as well.

Another major benefit to using this pattern is that the resulting code is quite self-explanatory: it’s no mystery to what the Elements.Security.Classes.User.Caption is referring.

One last thing: folder structure

The previous pattern had a single monolithic file. The new pattern increases the number of files—possibly by quite a lot. It’s recommended to put these new files into the following structure:

[-] Models
    [+] Aspects
    [+] Elements
    [+] Generators

The “Aspects” folder isn’t new to this pattern, but it’s worth mentioning that any model-specific aspects should go into a separate folder.

That’s all for now. Happy modeling!

[1] Naturally, the IMetaModel is always available and any part of the generation process can access metadata in the model at any time. However, the API for the model is quite generic and requires knowledge of the unique identifier or index for a piece of metadata.

[2] By introspection, we mean that if metadata is accessed through .NET code structures—like properties or constants—we should be able to find all usages of a particular metadata element without resorting to a “find in files” for a particular string.

[3] It doesn’t have to go there. The DemoCoreGenerator could also set up the builder (since it’s using the same builder object). To do that, you’d override AddCoreElements() and set up the model there. However, it’s clearer to keep it in the generator that actually owns the builder that is being configured.

[4] Simple classes generally have few extra properties and no layouts or short description classes.

[5] Through the IElementProvider mentioned above

Updating to a touch-friendly UI

2012-11-21T23:08:51+01:00

Published by marco on 21. Nov 2012 23:08:51 (GMT-5)

Updated by marco on 8. Mar 2013 09:44:48 (GMT-5)

I was recently redesigning a web page and wanted to make it easier to use from touch-screen browsers. Links made only of text are relatively easy to click with a mouse, but tend to make poor touch targets. If the layout has enough space around the link, this can be remedied by applying CSS.

The basic box

First Second Third

Suppose we have a box with three links in it, as shown to the right.

Setting the height

The first step is to make this box taller, so the logical thing to do is to set the height. We’ll have to pick a value, so set height: 40px on the gray box.

First Second Third

Aligning vertically

This isn’t exactly what we want, though; we’d rather have the vertical space equally distributed. Also, if you hover over the links, you can see that the space below the text is not active. Maybe we can try to add vertical-align: middle to align the content.

First Second Third

Unfortunately, this doesn’t have the desired effect. The vertical-align property works when used this way in table cells, but otherwise has no effect for block elements. Knowing that, we can set display: table-cell for the gray box.

First Second Third

And now the box has become longer, because the 50% width of the box is calculated differently for table cells than for regular boxes (especially when a table cell is found outside of a table).

Relative positioning

Let’s abandon the vertical-alignment approach and try using positioning instead. Set position: relative and top: 25% to center the links vertically.

First Second Third

Now that looks much better, but the space above and below the links is still not active. Perhaps we can use the height trick again, to make the individual links taller as well. So we set height: 100% on each of the links.

First Second Third

We didn’t get the expected result, but we should have expected that: the links are inline elements and can only have a height set if we set display: inline-block on each link as well. We use inline-block rather than block so that the links stay on the same line.

First Second Third

The links are now the right size, but they stick out below the gray box, which isn’t what we wanted at all. We’re kind of out of ideas with this approach, but there is another way we can get the desired effect.

Positive padding and negative margins

Let’s start with the original gray box and, instead of choosing a random height as we did above—40px—let’s set padding: 8px on the gray box to make room above and below the links.

First Second Third

With just one CSS style, we’ve already got the links nicely aligned and, as an added benefit, this technique scales even if the font size is changed. The 8-pixel padding is preserved regardless of how large the font gets. [1]

First Second Third

This approach seems promising, but the links are still not tall enough. The naive approach of setting height: 100% on the links probably won’t work as expected, but let’s try it anyway.

First Second Third

It looks like the links were already 100% of the height of the container; in hindsight it’s obvious, since the height of the gray box is determined by the height of the links. The 100% height refers to the client area of the gray box, which doesn’t include the padding.

We’d actually like the links to have padding above and below just as the gray box has. As we saw above, the links will only honor the padding if they also have display: inline-block, so let’s set that in addition to padding: 8px.

First Second Third

We’re almost there. The only thing remaining is to make the vertical padding of the links overlap with the vertical padding of the gray box. We can do this by using a negative vertical margin, setting margin: -8px.

First Second Third

We finally have the result we wanted. The links are now large enough for the average finger to strike without trying too hard. Welcome to the CSS-enabled touch-friendly world of web design.

The code for the final example is shown below, with the sizing/positioning styles highlighted:

.gray-box
{
  background-color: gray;
  border: 1px solid black;
  border-width: 1px 0;
  width: 50%;
  text-align: center;
  padding: 8px 0;
}

.gray-box a
{
  background-color: #8F8F8F;
  display: inline-block;
  padding: 8px 20px;
  margin: -8px 0;
}


  First
  Second
  Third

[1] Naturally, we could also use .8em instead and then the padding will scale with the font size. This would work just as well with the height. Let’s pretend that we’re working with a specification that requires an 8-pixel padding instead of a flexible one.

Solving problems

2012-01-08T17:13:18+01:00

Published by marco on 8. Jan 2012 17:13:18 (GMT-5)

This graphic Geeks versus Non-Geeks when Doing Repetitive Tasks (How-to Geek) illustrates quite nicely how programmers approach the world of problem-solving.

The chart does not show just much time must be spent before the programmer wins, that being dependent on the complexity of the task. The probability that the task will recur is also highly relevant, as automation of a smallish, one-time task is useless. Neither of those things will stop a determined programmer, though, who will automate no matter what.

Encodo Git Handbook 1.2 is available

2011-10-09T12:21:54+02:00

Published by marco on 9. Oct 2011 12:21:54 (GMT-5)

Updated by marco on 14. Sep 2014 10:54:47 (GMT-5)

tl;dr: Encodo Systems AG has moved from Perforce to Git and has written a manual for getting started for other users or companies looking to make the leap. It’s available for free at Encodo Git Handbook.

In the beginning, there was Microsoft Visual SourceSafe. And it was not good.

In 1994, I started working for a small software company. Source control was a structured network share until I started moving projects into Microsoft Visual SourceSafe, which was slow and balky and feature-poor, but it was better than manual merging.

Until it corrupted its own database, losing our entire history. Luckily, we were able to piece together the repository from local workspaces. But the search was on to find a replacement.

In 1997, we moved to Perforce and were very happy for many years. I even used the two-user free license to run a personal Perforce server on earthli for a while.

Then I moved to Switzerland to work for Opus Software AG, a very tech-savvy company, which was, of course, using source-control software. You haven’t heard of it, though, because it was an internal tool. It worked fine and even supported branches, but some operations didn’t scale as well as they should and it was a bit difficult to understand, in general.

So, I started a campaign to move to something else. We evaluated various alternatives, including Subversion and Perforce. Perforce won—mostly because Subversion’s merging support in pre-1.5 versions was laughable—and I was back on the source-control system I’d been using for almost ten years at that point.

For my personal projects, I switched to Mercurial because I was working with other users and the two-user limit for the free Perforce license was no longer adequate, but neither was I willing to cough up $800 per user in order to continue using Perforce. I chose Mercurial because I wanted a DVCS and a good friend/coworker of mine is a lead developer on the project, so he was around to help me when I had questions.

When I left that company to found Encodo Systems AG, Perforce was the logical choice for source control. We used it exclusively for our own projects for several years, using Subversion only to access repositories hosted by two different customers. What finally broke Perforce’s lock on Encodo was offline and remote work. We finally got a customer who wanted to work with Git instead of Perforce or Subversion and the customer is king, so we started to learn Git.

It wasn’t easy at first, especially if you don’t read any documentation or background information on Git concepts. But we got the hang of it and quickly became accustomed to the freedom offered by Git versus a central-server solution like Perforce.

So off we went to do an internal evaluation on source-control systems, this time including Mercurial/Kiln, Git, PlasticSCM, Perforce and TFS. We quickly decided against TFS for several reasons, primarily that it was too tightly-coupled to other Microsoft systems that we weren’t using or prepared to use yet. Perforce was, at the time (February 2011), a wholly centralized solution and had not yet made moves in a DVCS direction. PlasticSCM was good, but didn’t overwhelm us and finally Mercurial was also good, but even my aforementioned colleague—the developer on the Mercurial project—told us that there was no advantage relative to Git if we were already familiar and (relatively) comfortable with Git.

So Encodo moved all of its source code to several Git repositories hosted on an internal Gitorious server. Since Git is more a version-control toolkit/framework than a complete end-user solution, I/we use several tools on top of Git to make it more comfortable and to reduce points-of-failure.

Encodo Git Shell: A shell based on Cygwin with a lot of default configuration to make it easier to work with Git on the command-line in Windows (discussed in the handbook)
SmartGit: SmartGit is a $50 Windows client for working with Git repositories. Most of use it for the graphical view of the repository log as well as file histories and for diffing and merging changes. It also has a handy Index Editor for staging/unstaging changes instead of entire files.
BeyondCompare: I use this excellent differ instead of P4merge or the standard differ in SmartGit because it offers very convenient in-place editing as well as an excellent matching algorithm that even allows user hints (the F7 key for those that use the feature).
P4merge: I still use this free Perforce tool for merging files because it’s really the best thing out there for three-way-merges (at least until I become more accustomed to the BeyondCompare three-way-merge window).
Enigma: A tool that we’ve started developing internally to provide a GUI overview to see the status of and apply operations to multiple repositories. We just did the 0.1 release internally last week so it’s not a huge part of my workflow yet, but it probably will be soon. (discussed in the handbook)

So, after nearly 15 years of using Perforce almost exclusively, I am now almost exclusively a Git user (I still use Subversion and Mercurial very rarely for some personal and customer projects). An Encodo developer, Stephan Hauser, wrote a handbook in June to help everyone get up-to-speed on using Git. I recently updated it to account for the last several months of working with Git and we published it just last week. You can download it for free at Encodo Git Handbook.

How to Write Good Code

2011-03-27T20:35:28+02:00

Published by marco on 27. Mar 2011 20:35:28 (GMT-5)

The oft though-provoking XKCD published a flow chart recently, called Good Code (XKCD), which outlines the two branches: doing it fast or doing it right. The chart is linked below.

Despite the panacea of Agile Development, you still can’t have both fast and right. While it is possible to write good code, the odds are good that that code will accomplish a task that no longer requires completion (indicated by the “requirements have changed” block).

Even if it’s decent code, it’s further quite likely that it is “code with concessions” and, though it works, there are a forest of TODOs and missing integration tests blighting your conscience.

TuneSync 2.0

2011-03-26T11:21:16+01:00

Published by marco on 26. Mar 2011 11:21:16 (GMT-5)

A few years ago, I developed a utility for syncing ratings, play counts and last-played times between the same set of songs on two different iTunes installations. I haven’t worked on it in years, but it’s quite well-written and full-featured and has rich documentation with a tutorial. You can download the Windows-only software for free.

I originally wrote this software because I was listening to a lot of music at work and rating it. When I got home, I didn’t have these ratings anymore because they were only stored on my work laptop. Likewise, the ratings at home weren’t making their way to my laptop. And it wasn’t only ratings: play count and last-played date also help the digital DJ decide what to play. That was a problem when I decided to select a playlist from the machine in the living room: it had no ratings and couldn’t decide very well which music to choose from its collection.

And it’s not just user data like ratings: there’s also the matter of song data, like album and genre, which are often wrong or incomplete. If you fix it on one machine, your—or your friends or partner—might appreciate having the improved tag information for free.

Since then, the world has moved on a bit, with the Home Sharing feature letting me play music from the office machine on the living room player and services like GrooveShark letting you keep your music collection in the cloud. However, I still have a couple of iTunes libraries around and still want to sync them now and again to have the most up-to-date information from which to launch a smart playlist or run the Genius. On top of that, a lot of people have Apple gadgets that work only with iTunes. There are a lot of iTunes libraries out there that could probably benefit from syncing. See “Who Needs TuneSync?” in the documentation to find out more.

What TuneSync does is load two iTunes library files, compares them using various heuristics and lets you synchronize selected information between the two. You can then store the changes to both files and force iTunes to reload its metadata from this library. You are in full control over the information that is synchronized from one library to the other and vice versa and you can even edit information directly if neither side is 100% correct.

Though the software was developed years ago, it still loads iTunes libraries for versions as recent as 10.2.x. The two libraries I compared had about 8000 and 7500 songs respectively (about 15-16MB XML files) and TuneSync was to load them both in less than 15 seconds. Memory usage was about 150MB and the application responded smoothly and quickly for all operations.

TuneSync runs on any reasonably modern Windows operating system.

The best place to go for questions is the documentation, but here’s a brief overview of the functionality (with screenshots).

Loading Two iTunes Libraries

First you choose the two project files:

Once the files are loaded, the libraries are compared with the default heuristic (which is relatively strict) and TuneSync presents you with a comparison view. The one shown below is “Matched Songs”, but you can also see just the songs in each libary, all songs or unmatched songs too. See Default Views in the documentation for more information.

Exact matches are in black
Matches where the song data differs are in red (e.g. same name, but different genre)
Matches where the user data differs are in purple (e.g. different ratings)
Changed data is in blue (occurs after a manual edit or a synchronization)
See Column Data Colors in the help file for more information.

Filtering and Analyzing Comparison Data

As mentioned above, there are tabs for the common filters—all songs, songs in library one, library two, matched songs and unmatched songs—but in each view you can also search and filter by other criteria.

You can filter by type of match or simply by typing in the filter box to restrict the songs shown in any view. The screenshot above shows only songs that have different song data for which there is only one match. The various fields are colored according to the schema outlined above. See Songs in a Library and Column Data in the documentation for more information.

The screenshot above shows the song properties that you can show by selecting one or more songs (if you select multiple songs, the details are collapsed and summarized as much as possible). See Song Info Pane and Selecting Multiple Songs in the documentation for more information.

Matching & Synchronizing

The default matching heuristic is quite strict and is best for libraries that have either been copied from one another or been synced before. Those matches are easy to synchronize (see below) without too much worry that there are invalid matches.

However, you can ask TuneSync to perform additional matches using custom heuristics, shown above. See Match Options Window in the documentation for the more information.

Depending on the options chosen in the Match Window, you will see a lot more red here and will have to be more careful which matches you accept. See Match Results Window in the documentation for the more information.

Once you have all of the matches set up correctly, you can synchronize data between the libraries. You can either do this manually by using the Song Info Pane at the bottom of the window or by synchronizing multiple songs using certain criteria, as shown in the screenshot above. See Synchronize Individual Data, Using the Info Pane and Synchronize Multiple Data in the documentation for more information.

Once you’ve matched and synchronized songs, you can use the filters to search for modified songs in either library to verify the changes before exporting them back to the source files. If you’re running TuneSync on the same machine as the iTunes library that you’re replacing, you can have it replace the library for you; otherwise, you have to copy it to the proper location manually. See Tutorial: Checking Songs, Check Songs, Import from iTunes with TuneSync and Import from iTunes by Hand in the documentation for more information.

Encodo C# Handbook 7.30 -- Loose vs. Tight Coupling

2011-03-19T21:08:09+01:00

Published by marco on 19. Mar 2011 21:08:09 (GMT-5)

I’m currently revising the Encodo C# Handbook to update it for the last year’s worth of programming experience at Encodo, which includes a lot more experience with C# 4.0 features like optional parameters, dynamic types and more. The following is an expanded section on working with Linq. A final draft should be available by the middle of April or so.

7.30 – Loose vs. Tight Coupling

Whether to use loose or tight coupling for components depends on several factors. If a component on a lower-level must access functionality on a higher level, this can only be achieved with loose coupling: e.g. connecting the two by using one or more delegates or callbacks.

If the component on the higher level needs to be coupled to a component on a lower level, then it’s possible to have them be more tightly coupled by using an interface. The advantage of using an interface over a set or one or more callbacks is that changes to the semantics of how the coupling should occur can be enforced. The example below should make this much clearer.

Imagine a class that provides a single event to indicate that it has received data from somewhere.

public class DataTransmitter
{
  public event EventHandler DataReceived;
}

This is the class way of loosely coupling components; any component that is interested in receiving data can simply attach to this event, like this:

public class DataListener
{
  public DataListener(DataTransmitter transmitter)
  {
    transmitter.DataReceived += TransmitterDataReceived;
  }

  private TransmitterDataReceived(object sender, DataBundleEventArgs args)
  {
    // Do something when data is received
  }
}

Another class could combine these two classes in the following, classic way:

var transmitter = new DataTransmitter();
var listener = new DataListener(transmitter);

The transmitter and listener can be defined in completely different assemblies and need no dependency on any common code (other than the .NET runtime) in order to compile and run. If this is an absolute must for your component, then this is the pattern to use for all events. Just be aware that the loose coupling may introduce semantic errors—errors in usage that the compiler will not notice.

For example, suppose the transmitter is extended to include a new event, NoDataAvailableReceived.

public class DataTransmitter
{
  public event EventHandler DataReceived;
  public event EventHandler NoDataAvailableReceived;
}

Let’s assume that the previous version of the interface threw a timeout exception when it had not received data within a certain time window. Now, instead of throwing an exception, the transmitter triggers the new event instead. The code above will no longer indicate a timeout error (because no exception is thrown) nor will it indicate that no data was transmitted.

One way to fix this problem (once detected) is to hook the new event in the DataListener constructor. If the code is to remain highly decoupled—or if the interface cannot be easily changed—this is the only real solution.

Imagine now that the transmitter becomes more sophisticated and defines more events, as shown below.

public class DataTransmitter
{
  public event EventHandler DataReceived;
  public event EventHandler NoDataAvailableReceived;
  public event EventHandler ConnectionOpened;
  public event EventHandler ConnectionClosed;
  public event EventHandler ErrorOccured;
}

Clearly, a listener that attaches and responds appropriately to all of these events will provide a much better user experience than one that does not. The loose coupling of the interface thus far requires all clients of this interface to be proactively aware that something has changed and, once again, the compiler is no help at all.

If we can change the interface—and if the components can include references to common code—then we can introduce tight coupling by defining an interface with methods instead of individual events.

public interface IDataListener
{
  void DataReceived(IDataBundle bundle);
  void NoDataAvailableReceived();
  void ConnectionOpened();
  void ConnectionClosed();
  void ErrorOccurred(Exception exception, string message);
}

With a few more changes, we have a more tightly coupled system, but one that will enforce changes on clients:

Add a list of listeners on the DataTransmitter
Add code to copy and iterate the listener list instead of triggering events from the DataTransmitter.
Make DataListener implement IDataListener
Add the listener to the transmitter’s list of listeners.

Now when the transmitter requires changes to the IDataListener interface, the compiler will enforce that all listeners are also updated.

Encodo C# Handbook 7.17 -- Using System.Linq

2011-03-19T21:00:03+01:00

Published by marco on 19. Mar 2011 21:00:03 (GMT-5)

I’m currently revising the Encodo C# Handbook to update it for the last year’s worth of programming experience at Encodo, which includes a lot more experience with C# 4.0 features like optional parameters, dynamic types and more. The following is an expanded section on working with Linq. A final draft should be available by the middle of April or so.

7.17 – Using `System.Linq`

When using Linq expressions, be careful not to sacrifice legibility or performance simply in order to use Linq instead of more common constructs. For example, the following loop sets a property for those elements in a list where a condition holds.

foreach (var pair in Data)
{
  if (pair.Value.Property is IMetaRelation)
  {
    pair.Value.Value = null;
  }
}

This seems like a perfect place to use Linq; assuming an extension method ForEach(this IEnumerable), we can write the loop above using the following Linq expression:

Data.Where(pair => pair.Value.Property is IMetaRelation).ForEach(pair => pair.Value.Value = null);

This formulation, however, is more difficult to read because the condition and the loop are now buried in a single line of code, but a more subtle performance problem has been introduced as well. We have made sure to evaluate the restriction (“Where”) first so that we iterate the list (with “ForEach”) with as few elements as possible, but we still end up iterating twice instead of once. This could cause performance problems in border cases where the list is large and a large number of elements satisfy the condition.

7.17.1 – Lazy Evaluation

Linq is mostly a blessing, but you always have to keep in mind that Linq expressions are evaluated lazily. Therefore, be very careful when using the Count() method because it will iterate over the entire collection (if the backing collection is of base type IEnumerable). Linq is optimized to check the actual backing collection, so if the IEnumerable you have is a list and the count is requested, Linq will use the Count property instead of counting elements naively.

A few concrete examples of other issues that arise due to lazy evaluation are illustrated below.

7.17.2 – Capturing Unstable Variables/”Access to Modified Closure”

You can accidentally change the value of a captured variable before the sequence is evaluated. Since ReSharper will complain about this behavior even when it does not cause unwanted side-effects, it is important to understand which cases are actually problematic.

var data = new[] { "foo", "bar", "bla" };
var otherData = new[] { "bla", "blu" };
var overlapData = new List();

foreach (var d in data)
{
  if (otherData.Where(od => od == d).Any())
  {
    overlapData.Add(d);
  }
}

// We expect one element in the overlap, “bla”
Assert.AreEqual(1, overlapData.Count);

The reference to the variable d will be flagged by ReSharper and marked as an “access to a modified closure”. This is a reminder that a variable referenced—or “captured”—by the lambda expression—closure—will have the last value assigned to it rather than the value that was assigned to it when the lambda was created. In the example above, the lambda is created with the first value in the sequence, but since we only use the lambda once, and then always before the variable has been changed, we don’t have to worry about side-effects. ReSharper can only detect that a variable referenced in a closure is being changed within the scope that it checks and letting you know so you can verify that there are no unwanted side-effects.

Even though there isn’t a problem, you can rewrite the foreach-statement above as the following code, eliminating the “Access to modified closure” warning.

var overlapData = data.Where(d => otherData.Where(od => od == d).Any()).ToList();

The example above was tame in that the program ran as expected despite capturing a variable that was later changed. The following code, however, will not run as expected:

var data = new[] { "foo", "bar", "bla" };
var otherData = new[] { "bla", "blu" };
var overlapData = new List();

var threshold = 2;
var results = data.Where(d => d.Length == threshold);
var overlapData = data.Where(d => otherData.Where(od => od == d).Any());
if (overlapData.Any())
{
  threshold += 1;
}

// All elements are three characters long, so we expect no matches
Assert.AreEqual(0, results.Count());

Here we have a problem because the closure is evaluated after a local variable that it captured has been modified, resulting in unexpected behavior. Whereas it’s possible that this is exactly what you intended, it’s not a recommended coding style. Instead, you should move the calculation that uses the lambda after any code that changes variables that it capture:

var threshold = 2;
var overlapData = data.Where(d => otherData.Where(od => od == d).Any());
if (overlapData.Any())
{
  threshold += 1;
}
var results = data.Where(d => d.Length == threshold);

This is probably the easiest way to get rid of the warning and make the code clearer to read.

PHPDocumentor Fork (earthli-v2)

2011-03-19T20:30:46+01:00

Published by marco on 19. Mar 2011 20:30:46 (GMT-5)

Updated by marco on 19. Mar 2011 20:35:22 (GMT-5)

PHPDoc is a popular tool for generating documentation for PHP projects. I made a whole lot of improvements to it for PHP5 and updated all the skins to look less boxy, have nicer and more informative icons and be easier to use, and then created an earthli fork. This article includes a full feature list and screenshots.

The earthli WebCore (the software that runs this web site) is open-source. It is also relatively well-documented. The documentation is generated using PHPDoc, but a better version than that available in the main fork of PHPDoc found on the main site or in SourceForge.

Though PHPDoc does a decent job of gathering information and making it available to the templates, there are a few problems with the main fork:

Insufficient support for some of the newer PHP5 features
Uneven information availability in the different templates
Pretty old and staid templates

A long time ago—when PHP4 was still young—I contributed a whole new set of templates—modestly called “earthli” and “earthli:DOM”—to the project and brought the rendering up to a decent level. There were still problems, but it was—in my eyes—worlds better than any of the existing templates.

Things stayed like that for a while.

Then I ported my framework from PHP4 to PHP5 in early 2010 and discovered that PHPDoc was again limping along a bit, generating output that no longer met my standards.

Features of earthli-v2

So, I made a lot of improvements and again basically rewrote one set of templates (this time called “earthli-v2”) that I thought looked clean and offered the following features:

The HTML that is generated is purely structural, leaving formatting, coloring and other skinning up to the options.
The clean structure means the files look fine without icons or styling and also should be much more accessible: both for people and processing tools.
There are no more extra formatting containers and all presentational tables have been removed.
The HTML now validates 100% for all generated pages.
The CSS is cleanly separated into individual files and no longer includes anything that is not used. There is a base stylesheet and icon and coloring/positioning skins that allow complete control through CSS without touching the templates at all. So authors can define and tweak skins simply by adding a folder with a few CSS files and choosing that skin in the options file.
Options include:
1. Encoding: UTF-8 and ISO-8859-1 are supported; defaults to UTF-8.
2. Content Type: XHTMLand HTML are supported; defaults to HTML.
3. Default Type: Controls the type used when none other is given: common values are object, stdClass, mixed; defaults to mixed.
4. Source Links: Controls whether each element includes a link to source (the “#” in the screenshots); links can be included in summaries, indexes or both.
5. Source Formatting: Control whether source is formatted using non-breaking spaces or regular spacing and style-sheets with control over tab-size.
6. Access/Abstract Tags: Control whether access and abstract tags are generated; since these properties are also indicated by the icon, defaults to false.
7. Icon Mode: Control how icons are generated: valid values are css, img and none; defaults to css.
8. Dynamic: Control over whether Javascript is used; defaults to false
9. Section Inclusion: Control which sections are generated in the tree (as well as icon mode); defines, functions and variables can be turned on and off; default is to include everything in the tree.
10. Skin: Control the look and feel: the default (a PHPDoc standard) and earthli skins are included; defaults to earthli.

Screenshots

The old style isn’t horrible, but it’s a bit dated, with too many borders, blurry icons and too many bold fonts.

Old-style Home PageOld-style class page with Index

The new style is cleaner, has far fewer borders, better margins and alignments and nicer icons (for all elements, with access visibility for all element types) as well as much more legible placement and more information, including direct links to source for all elements, much nicer signature-formatting and tamer colors.

New-style Home PageNew-style Class PageNew-style Index Page

Downloading the Fork

I was in contact with the project maintainer but was never able to upload my changes into the main branch of PHPDoc. There have been no updates on the main line since late 2009 and I don’t know whether the project has died or not. I only just realized that I never officially published my changes, so I’m officially making the earthli fork available as a Mercurial repository or as a compressed archive.

Overriding Equality Operators: A Cautionary Tale

2010-12-18T01:21:38+01:00

Published by marco on 18. Dec 2010 01:21:38 (GMT-5)

Updated by marco on 22. Nov 2012 19:42:16 (GMT-5)

tl;dr: This is a long-winded way of advising you to always be sure what you’re comparing when you build low-level algorithms that will be used with arbitrary generic arguments. The culprit in this case was the default comparator in a HashSet, but it could be anything. It ends with cogitation about software processes in the real world.

Imagine that you have a framework with support for walking arbitrary object graphs in the form of a GraphWalker. Implementations of this interface complement a generalized algorithm.

This algorithm generates nodes corresponding to various events generated by the graph traversal, like beginning or ending a node or edge or encountering a previously processed node (in the case of graphs with cycles). Such an algorithm is eminently useful for formatting graphs into a human-readable format, cloning said graphs or other forms of processing.

A crucial feature of such a GraphWalker is to keep track of the nodes it has seen before in order to avoid traversing the same node multiple times and going into an infinite loop in graphs with cycles. For subsequent encounters with a node, the walker handles it differently—generating a reference event rather than a begin node event.

A common object graph is the AST for a programming language. The graph walker can be used to quickly analyze such ASTs for nodes that match particular conditions.

Processing a Little Language

Let’s take a look at a concrete example, with a little language that defines simple boolean expressions:

OR(
  (A < 2)
  (B > A)
)

Listing 1 – Sample

It’s just an example and we don’t really have to care about what it does, where A and B came from or the syntax. What matters is the AST that we generate from it:

1 Operator (OR)
2  Operator (<)
3    Variable (A)
4    Constant (2)
5  Operator (>)
6    Constant (B)
7    Variable (A)

Listing 2 – AST

When the walker iterates over this tree, it generates the following events (note the numbers at the front of the line correspond to the object in the diagram above:

1 begin node
1  begin edge
2    begin node
2      begin edge
3        begin node
3        end node
4        begin node
4        end node
2      end edge
2    end node
5    begin node
5      begin edge
6        begin node
6        end node
7        begin node
7        end node
5      end edge
5    end node
1  end edge

Listing 3 – Event Tree

Now that’s the event tree we expect. This is also the event tree that we get for the objects that we’ve chosen to represent our nodes (Operator, Variable and Constant in this case). If, for example, we process the AST and pass it through a formatter for this little language, we expect to get back exactly what we put in (namely the code in Listing 1). Given the event tree, it’s quite easy to write such a formatter—namely, by handling the begin node (output the node text), begin edge (output a “(”) and end edge (output a “)”) events.

So far, so good?

Running Into Trouble

However, now imagine that we discover a bug in other code that uses these objects and we discover that when two different objects refer to the same variable, we need them to be considered equal. That is, we update the equality methods—in the case of .NET, Equals() and GetHashCode()—for Variable.

As soon as we do, however, the sample from Listing 1 now formats as:

OR(
  (A < 2)
  (B > )
)

Listing 4 – Formatting Error

Now we have to figure out what happened. A good first step is to see what the corresponding event tree looks like now. We discover the following:

1 begin node
1  begin edge
2    begin node
2      begin edge
3        begin node
3        end node
4        begin node
4        end node
2      end edge
2    end node
5    begin node
5      begin edge
6        reference
7        begin node
7        end node
5      end edge
5    end node
1  end edge

Listing 3 – Event Tree

The change is highlighted and affects the sixth node, which has now become a reference because we changed how equality is handled for Variables. The algorithm now considers any two Variables with the same name to be equivalent even if they are two different object references.

Fix #1—Hack the Application Code)

If we look back at how we wrote the simple formatter above, we only handled the begin node, begin edge and end edge events. If we throw in a handler for the reference event and output the text of the node, we’re back in business and have “fixed” the formatter.

Fix #2—Fix the Algorithm

But we ignore the more subtle problem at our own peril: namely, that the graph walking-code is fragile in that its behavior changes due to seemingly unrelated changes in the arguments that are passed. Though we have a quick fix above, we need to think about providing more stability in the algorithm—especially if we’re providers of low-level framework functionality. [1]

The walker algorithm uses a HashSet to track the nodes that it has previously encountered. However, the default comparator—again, in .NET—leans on the equality functions of the objects stored in the map to determine membership.

The first solution—or rather, the second one, as we already “fixed” the problem with what amounts to a hack above by outputting references as well—is to change the equality comparator for the HashSet to explicitly compare references. We make that change and we can once again remove the hack because the algorithm no longer generates references for subsequent variable encounters.

Fix #3—Giving the Caller More Control

However, we’re still not done. We’ve now not only gotten our code running but we’ve fixed the code for the algorithm itself so the same problem won’t crop up again in other instances. That’s not bad for a day’s work, but there’s still a nagging problem.

What happens if the behavior that was considered unexpected in this case is exactly the behavior that another use of the algorithm expects? That is, it may well be that other types of graph walker will actually want to be able to control what is and is not a reference by changing the equivalence functions for the nodes. [2]

Luckily, callers of the algorithm already pass in the graph walker itself, the methods of which the algorithm already calls to process nodes and edges. A simple solution is to add a method to the graph walker interface to ask it to create the kind of HashSet that it would like to use to track references.

Tough Decisions: Which Fix to Use?

So how much time does this all take to do? Well, the first solution—the hack in application code—is the quickest, with time spent only on writing the unit test for the AST and verifying that it once again outputs as expected.

If we make a change to the framework, as in the second solution where we change the equality operator, we have to create unit tests to test the behavior of the AST in application code, but using test objects in the framework unit tests. That’s a bit more work and we may not have time for it.

The last suggestion—to extend the graph walker interface—involves even more work because we then have to create two sets of test objects: one set that tests a graph walker that uses reference equality (as the AST in the application code) and one that uses object equality (to make sure that works as well).

It is at this point that we might get swamped and end up working on framework code and unit tests that verify functionality that isn’t even being used—and certainly isn’t being used by the application with the looming deadline. However, we’re right there, in the code, and will never be better equipped to get this all right than we are right now. But what if we just don’t have time? What if there’s a release looming and we should just thank our lucky stars that we found the bug? What if there’s no time to follow the process?

Well, sometimes the process has to take a back seat, but that doesn’t mean we do nothing. Here are a few possibilities:

Do nothing in the framework; add an issue to the issue tracker explaining the problem and the work that needs to be done so that it can be fixed at a more opportune time (or by a developer with time). This costs a few minutes of time and is the least you should do.
Make the fix in the framework to prevent others from getting bitten by this relatively subtle bug and add an issue to the issue tracker describing the enhanced fix (adding a method to the graph walker) and the tests that need to be written.
Add the method to the graph walker interface so that not only do others not get bitten by the bug but, should they need to control equivalence, they can do so. Add an issue describing the tests that need to be written to verify the new functionality.

What about those who quite rightly frown at the third possibility because it would provide a solution for what amounts to a potential—as opposed to actual—problem? It’s really up to the developer here and experience really helps. How much time does it take to write the code? How much does it change the interface? How many other applications are affected? How likely is it that other implementations will need this fix? Are there potential users who won’t be able to make the fix themselves? Who won’t be able to recompile and just have to live with the reference-only equivalence? How likely is it that other code will break subtly if the fix is not made? It’s not an easy decision either way, actually.

Though purists might be appalled at the fast and loose approach to correctness outlined above, pragmatism and deadlines play a huge role in software development. The only way to avoid missing deadlines is to have fallback plans to ensure that the code is clean as soon as possible rather than immediately as a more stringent process would demand.

And thus ends the cautionary tale of making assumptions about how objects are compared and how frameworks are made.

[1] Which we are.

[2] This possibility actually didn’t occur to me until I started writing this blog post, which just goes to show how important it is to document and continually think about the code your write/have written.

Sealed classes and methods

2010-05-06T22:39:54+02:00

Published by marco on 6. May 2010 22:39:54 (GMT-5)

Updated by marco on 19. Mar 2011 21:03:14 (GMT-5)

According to the official documentation, the sealed keyword in C# serves the following dual purpose:

“When applied to a class, the sealed modifier prevents other classes from inheriting from it. […] You can also use the sealed modifier on a method or property that overrides a virtual method or property in a base class. This enables you to allow classes to derive from your class and prevent them from overriding specific virtual methods or properties.”

Each inheritable class and overridable method in an API is part of the surface of that API. Functionality on the surface of the API costs money and time because it implies a promise to support that API through subsequent versions. The provider of the API more-or-less guarantees that potential modifications—through inheritance or overriding—will not be irrevocably broken by upgrades. At the very least, it implies that so-called breaking changes are well-documented in a release and that an upgrade path is made available.

In C#, the default setting for classes and methods is that classes are not sealed and methods are sealed (non-virtual, which amounts to the same thing). Additionally, the default visibility in C# is internal, which means that the class or method is only visible to other classes in the assembly. Thus, the default external API for an assembly is empty. The default internal API allows inheritance everywhere.

Some designers recommend the somewhat radical approach of declaring all classes sealed and leaving methods as non-virtual by default. That is, they recommend reducing the surface area of the API to only that which is made available by the implementation itself. The designer should then carefully decide which classes should be extensible—even within the assembly, because designers have to support any API that they expose, even if it’s only internal to the assembly—and unseal them, while deciding which methods should be virtual.

From the calling side of the equation, sealed classes are a pain in the ass. The framework designer, in his ineffable wisdom, usually fails to provide an implementation that does just what the caller needs. With inheritance and virtual methods, the caller may be able to get the desired functionality without rewriting everything from scratch. If the class is sealed, the caller has no recourse but to pull out Reflector™ and make a copy of the code, adjusting the copy until it works as desired.

Until the next upgrade, when the original version gets a few bug fixes or changes the copied version begins to diverge from it. It’s not so clear-cut whether to seal classes or not, but the answer is—as with so many other things—likely a well-thought out balance of both approaches.

Sealing methods, on the other hand, is simply a way of reverting that method back to the default state of being non-virtual. It can be quite useful, as I discovered in a recent case, shown below.

I started with a class for which I wanted to customize the textual representation—a common task.

class Expression
{
  public override string ToString()
  {
    // Output the expression in human-readable form
  }
}

class FancyExpression : Expression
{
  public override string ToString()
  {
    // Output the expression in human-readable form
  }
}

So far, so good; extremely straightforward. Imagine dozens of other expression types, each overriding ToString() and producing custom output.

Time passes and it turns out that the formatting for expressions should be customizable based on the situation. The most obvious solution it to declare an overloaded version of ToString() and then call the new overload from the overload inherited from the library, like this:

class Expression
{
  public override string ToString()
  {
    return string ToString(ExpressionFormatOptions.Compact);
  }

  public virtual string ToString(ExpressionFormatOptions options)
  {
    // Output the expression in human-readable form
  }
}

Since the new overload is a more powerful version of the basic ToString(), we just redefine the latter in terms of the former, choosing appropriate default options. That seems simple enough, but now the API has changed and in a seemingly unenforcable way. Enforcable, in this context, means that the API can use the semantics of the language to force callers to use it in a certain way. Using the API in non-approved ways should result in a compilation error.

This new version of the API now has two virtual methods, but the overload of ToString() without a parameter is actually completely defined in terms of the second overload. Not only is there no longer any reason to override it, but it would be wrong to do so—because the API calls for descendants to override the more powerful overload and to be aware of and handle the new formatting options.

But, this is the second version of the API and there are already dozens of descendants that override the basic ToString() method. There might even be descendants in other application code that isn’t even being compiled at this time. The simplest solution is to make the basic ToString() method non-virtual and be done with it. Descendents that overrode that method would no longer compile; maintainers could look at the new class declaration—or the example-rich release notes!—to figure out what changed since the last version and how best to return to a compilable state.

But ToString() comes from the object class and is part of the .NET system. This is where the sealed keyword comes in handy. Just seal the basic method to prevent overrides and the compiler will take care of the rest.

class Expression
{
  public override sealed string ToString()
  {
    return ToString(ExpressionFormatOptions.Compact);
  }

  public virtual string ToString(ExpressionFormatOptions options)
  {
    // Output the expression in human-readable form
  }
}

Even without release notes, a competent programmer should be able to figure out what to do. A final tip, though, is to add documentation so that everything’s crystal clear.

class Expression
{
  /// 
  /// Returns a text representation of this expression.
  /// 
  /// 
  /// A text representation of this expression.
  /// 
  /// 
  /// This method can no longer be overridden; instead, 
  /// override .
  /// 
  /// 
  public override sealed string ToString()
  {
    return ToString(ExpressionFormatOptions.Compact);
  }

  /// 
  /// Gets a text representation of this expression using the given 
  /// .
  /// 
  /// The options to apply.
  /// 
  /// A text representation of this expression using the given 
  /// 
  /// 
  public virtual string ToString(ExpressionFormatOptions options)
  {
    // Output the expression in human-readable form
  }
}

How to configure a local firewall for OpenVPN (Part II)

2010-04-28T21:25:44+02:00

Published by marco on 28. Apr 2010 21:25:44 (GMT-5)

Updated by marco on 28. Apr 2010 23:02:57 (GMT-5)

The following tip was developed using Ubuntu 9.1x (Hardy Heron) with OpenVPn 2.1rc19. It builds on the the setup from Part I.

Part I of this guide to configuring a local firewall for OpenVPN introduced you to using iptables on Linux. It also included a script for OpenVPN that opened and closed the firewall for specific IP addresses. If you haven’t read it already, you should probably go do that first.

Unfortunately, it turns out that the firewall configuration from part I is not watertight because it still allows FORWARDs for all IP addresses. If you’ll recall, we solved this problem for INPUTs by closing them by default and selectively opening them.

The first step is to ascertain that the firewall is configured as we expect. A call to sudo iptables -nL elicits the following output:

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

As you can see, the default policy for FORWARD is ACCEPT, which allows anyone to access other IP addresses from this machine. In Part I, you created a file named /etc/iptables.uprules in which you stored the default configuration of the firewall. You’ll want to change that as shown below (the changes are highlighted):

*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i eth0 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A FORWARD -i eth0 -j ACCEPT
-A FORWARD -i lo -j ACCEPT
COMMIT

Restart networking by executing sudo /etc/init.d/networking restart. A call to sudo iptables -nL should now elicit the following output (the main changes are highlighted):

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Now all IP forwarding requests are blocked by default. Those for the lo and eth0 interfaces are, of course, still enabled, to allow the machine to be reachable both by itself and the local network.

The final step is to change the firewall configuration script to open up IP forwarding for employees, but not for strangers. Since this is a FORWARD rule, not an INPUT one, the script has to make sure the remove all firewall rules for the client IP address instead of just the INPUT rules as it did previously. Changes from the script in Part I are highlighted.

function inlist
{
  less `dirname $0`/$1 | egrep "^${CLIENTCERT}$" > /dev/null
  if [ $? -eq 0 ]; then
    return 0
  else
    return 1
  fi
}

function get_next_matching_firewall_rule
{
  ip_address=$1
  channel=$2

  RULE="`iptables -L $channel -n –line-numbers | grep $ip_address | head -n 1`"
}

function drop_rule_from_iptables
{
  rule="$1"
  channel="$2"
  echo "  Drop rule [$rule] for channel [$channel]"
  line_number=`echo "$rule" | awk '{print $1}'`
  iptables -D $channel $line_number
}

function add_port_to_iptables
{
  source_ip=$1
  destination_ip=$2
  protocol=$3
  port=$4

  iptables -A INPUT -i tun0 -s $source_ip -d $destination_ip -p $protocol –dport $port -j ACCEPT
  iptables -A INPUT -i tun0 -s $source_ip -d $destination_ip -p $protocol –dport $port -j ACCEPT
}

function add_destination_to_iptables
{
  source_ip=$1
  destination_ip=$2
  iptables -A INPUT -i tun0 -s $source_ip -d $destination_ip -j ACCEPT
  iptables -A FORWARD -i tun0 -s $source_ip -d $destination_ip -j ACCEPT
}

function open_firewall_for_strangers
{
  echo "  Add route for DNS"
  add_port_to_iptables $CLIENTIP 192.168.1.1 "UDP" 53

  echo "  Add route for Windows shares"
  add_port_to_iptables $CLIENTIP 192.168.1.5 "TCP" 139
  add_port_to_iptables $CLIENTIP 192.168.1.5 "TCP" 445

  return 0
}

function open_firewall_for_employees
{
  echo "  Add routes for all ip addresses"
  iptables -A INPUT -i tun0 -s $CLIENTIP -j ACCEPT
  iptables -A FORWARD -i tun0 -s $CLIENTIP -j ACCEPT
  return 0
}

function open_firewall
{
  echo "Opening firewall for $CLIENTCERT @ [$CLIENTIP]"
# TODO Add filtering for other lists, if desired
# inlist "MYGROUP.list"
#if [ $? -eq 0 ]; then
#  echo "  Certificate found in MYGROUP list"
#  open_firewall_for_MYGROUP
#  return 0
#else
  inlist "strangers.list"
  if [ $? -eq 0 ]; then
    echo "  Certificate found in strangers list"
    open_firewall_for_strangers
    return 0
  else
    inlist "employees.list"
    if [ $? -eq 0 ]; then
      echo "  Certificate found in employee list"
      open_firewall_for_employees
      return 0
    else
      echo "  Certificate not found in any list"
      return 1
    fi
  fi
}

function close_firewall_channel
{
  channel=$1

  get_next_matching_firewall_rule $CLIENTIP $channel

  while [ -n "$RULE" ]
  do
    drop_rule_from_iptables "$RULE" $channel
    get_next_matching_firewall_rule $CLIENTIP $channel
  done

}

function close_firewall
{
  echo "CloseFirewall for [$CLIENTIP]"

  close_firewall_channel "INPUT"
  close_firewall_channel "FORWARD"
  close_firewall_channel "OUTPUT"
}

# Main

OPERATION=$1
CLIENTIP=$2
CLIENTCERT=$3

case "$1" in
  add)
    close_firewall
    open_firewall
    ;;

  update)
    close_firewall
    open_firewall
    ;;

  delete)
    close_firewall
    ;;
  *)
    echo "Unknown operation"
    exit 1
esac

exit $?

Since you only changed the firewall configuration script, there is no need to restart OpenVPN.

Testing the Script

You can test to verify that the firewall is updated properly by simply executing the /etc/openvpn/configfirewall.sh script with various parameters. The expected parameters are an operation—”add” or “delete” for testing purposes—a name—matched against the names in your lists—and an IP address, which should be chosen so as not to interfere with any addresses assigned by either OpenVPN or a DHCP server.

To test what would happen when an employee connects through OpenVPN, execute the following command:

sudo /etc/openvpn/configfirewall.sh add 192.168.40.3 John_Doe

You should see the following output from the script:

CloseFirewall for [192.168.40.3]
OpenFirewall for John_Doe @ [192.168.40.3]
  Certificate found in employee list
  Add routes for all ip addresses

This sounds about right and it looks like the script ran as expected. You can check that the firewall was configured as expected with a call to sudo iptables -nL, which should now elicit the following output (the main changes are highlighted):

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  192.168.40.3         0.0.0.0/0

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  192.168.40.3         0.0.0.0/0

As you can see, the firewall accepts all INPUT and FORWARD from employees. Removing this test employee is as simple as executing:

sudo /etc/openvpn/configfirewall.sh delete 192.168.40.3 John_Doe

You should see the following output from the script:

CloseFirewall for [192.168.40.3]
  Drop rule [4    ACCEPT     all  –  192.168.40.3         0.0.0.0/0           ] for channel [INPUT]
  Drop rule [4    ACCEPT     all  –  192.168.40.3         0.0.0.0/0           ] for channel [FORWARD]

A call to sudo iptables -nL should now elicit the following output, where the rules for the employee have been removed:

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

You should really test with one user from each list, so the next user to test is a stranger. Add a stranger by calling the script with the strangers’s name instead of the employee’s name:

sudo /etc/openvpn/configfirewall.sh add 192.168.40.3 John_Stranger

You should see the following output from the script:

CloseFirewall for [192.168.40.3]
OpenFirewall for John_Stranger @ [192.168.40.3]
  Certificate found in strangers list
  Add route for DNS
  Add route for Windows shares

A call to sudo iptables -nL should now elicit the following output:

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     udp  –  192.168.40.3         192.168.1.1       udp dpt:53
ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:139
ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:445

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     udp  –  192.168.40.3         192.168.1.1       udp dpt:53
ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:139
ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:445

For strangers, the firewall accepts only requests on the ports and IP addresses explicitly opened by the script and drops all FORWARD requests. Removing this test employee is as simple as executing:

sudo /etc/openvpn/configfirewall.sh delete 192.168.40.3 John_Stranger

You should see the following output from the script:

CloseFirewall for [192.168.40.3]
  Drop rule [4    ACCEPT     udp  –  192.168.40.3         192.168.1.1       udp dpt:53 ] for channel [INPUT]
  Drop rule [4    ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:139 ] for channel [INPUT]
  Drop rule [4    ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:445 ] for channel [INPUT]
  Drop rule [4    ACCEPT     udp  –  192.168.40.3         192.168.1.1       udp dpt:53 ] for channel [FORWARD]
  Drop rule [4    ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:139 ] for channel [FORWARD]
  Drop rule [4    ACCEPT     tcp  –  192.168.40.3         192.168.1.5       tcp dpt:445 ] for channel [FORWARD]

A call to sudo iptables -nL should now elicit the following output, where the rules for the stranger have been removed:

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

You can use the script this way to test the firewall configuration without actually logging in through OpenVPN. When everything is set, you should still log with OpenVPN as a user from each list to verify that the firewall is doing what you think it is doing. In fact, that’s exactly why there is a Part II to this article: We tested by adding a user to the strangers list and logging in and noticed that we were able to ping many more servers than we had configured. Don’t let that happen to you!

Testing via OpenVPN

So, there’s one more trick that you can use to make testing via OpenVPN easier. Since you have to be outside the network to test tunneling in via VPN, you run into the problem of testing as a stranger because strangers probably won’t have rights to open a shell on the OpenVPN server. That is, you need to be able to do this:

Log in with VPN as an employee
Add yourself to the strangers list
Log out of VPN
Log in with VPN as a stranger
Test that you cannot access anything that you shouldn’t be able to
remove yourself from the strangers list (uh oh!)

Since you’re a stranger, you can no longer open a shell on the OpenVPN server and alter the configuration.

Here are some ways of getting around this problem:

The safest and best way to test via OpenVPN is to create a user exclusively for testing and add that user to each of your lists in turn, logging in and out to test the configuration. If you need to change something, you just log out and log back in using your “employee” user.
Another way is to open an OpenVPN session from one machine (it can be a virtual machine) and then add that same user to each of your lists in turn, logging in and out from another machine. Just make sure not to lose the connection from the “main” machine or you may be locked out.
Another way around this is to add an exception for the OpenVPN server to all configurations (strangers, employees, etc.) so that you can test almost everything. To do this, just add the a rule for the OpenVPN server (assumed to be on 192.168.1.1) as follows:
```
add_destination_to_iptables $CLIENTIP 192.168.1.1
```
When you’re finished testing, make sure to remove the hack.

Files

Finally, here are samples of all of the files modified in this tutorial. See Part I for the other files.

iptables.uprules: The default rules to apply to the firewall
configfirewall.sh: The firewall configuration executed when a user connects or disconnects from OpenVPN

How to configure a local firewall for OpenVPN

2010-04-20T22:37:16+02:00

Published by marco on 20. Apr 2010 22:37:16 (GMT-5)

Updated by marco on 21. Apr 2010 22:25:01 (GMT-5)

The following tip was developed using Ubuntu 9.1x (Hardy Heron) with OpenVPn 2.1rc19. It was originally published on the Encodo blogs and cross-published here.

There are dozens of guides around that describe how to optimally configure the iptables firewall on Linux for OpenVPN. There’s even a script installed by default that is extremely well-commented and shows to how close down the firewall, then open up only very selected ports and protocols for optimal browsing. However, all of those guides assume that the machine on which OpenVPN is installed is also the firewall separating an external network (the DMZ) from an internal one. Well, what if you have a dedicated firewall and run the OpenVPN server on a machine running in the internal network?

This tutorial assumes that you’ve already followed the instructions for setting up OpenVPN and that you’ve also set up a Public Key Infrastructure (PKI). That means that access to your internal network via OpenVPN is secured and will only authorize users that have a proper certificate and password.

All of the files and scripts mentioned in this tutorial are available for download as files at the end of the article.

Access for all!

Since the external firewall routes requests to OpenVPN directly to the internal machine, it cannot be used to restrict the actions of users that are tunneling into the internal network. Luckily, the default behavior is that users only have access to the OpenVPN server itself, which gives you time to consider how, exactly, you want to open things up.

Here are some questions you need to answer:

To which machines should users have access?
On which ports and protocols should users have access?
Which users should have which access?

For many organizations, the whole point of using OpenVPN is to let users work as if they are on the internal network, but from outside the physical office. In that case, the answers to the questions above will in many cases be:

All of them
All of them
All users should have all access

Let’s take care of that trivial case first, then. Execute sudo iptables -nL to show the current firewall configuration. You should see something like the following:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

This table indicates that all input, output and forward requests are accepted. OUTPUT requests are not interesting for this exercise, as they are generated by software running on the server itself, but INPUT and FORWARD requests bear more scrutiny. It looks like the firewall is already configured to allow access to everything your users need: There are no restrictions on inputs, which means that the firewall will allow requests on all ports and protocols for the local machine. There are likewise no restrictions on forwards, which means that requests to other IP addresses in the same subnet will be forwarded to those machines.

So, if FORWARDS are being, well, forwarded, why can’t you ping any other machines in the same subnet? Once you know the answer, it’s obvious: It’s because the firewall isn’t the one blocking forward requests. It’s because IP forwarding is a networking feature that must be explicitly enabled in the networking configuration. The article How to enable IP Forwarding will help you get this option configured, but the crux of the change is shown below.

Since you’ll probably want to make this change permanent, execute sudo vi /etc/sysctl.conf and remove the comment from the front of the line containing net.ipv4.ip_forward = 1. Restart networking by executing sudo /etc/init.d/networking restart and you’ll be good to go.

VIP Members Only.

The default network is now set up for smaller installations where everybody has the same permissions everywhere. What if, however, your needs are a little more complex? What if you have some users on your VPN that should only have access to certain resources i.e. certain ports and protocols?

In that case, you’ll have to use a different approach: Perhaps something like the following:

Close the firewall by default, including access to the OpenVPN machine itself. That means that authorized users will be able to establish a tunnel using OpenVPN, but that they won’t be able to do anything else until the firewall is opened again in the ensuing steps.
Determine which user has connected/authorized through OpenVPN.
Determine the set of IP addresses/ports/protocols to which that user should be given access.
Open the firewall for only those IP addresses/ports/protocols and only from the client IP address for that user’s current tunneling session.
When the user disconnects, close the firewall for that client IP address.

The first step is to close the firewall by default. As you can see from the iptables listing above, the firewall accepts all INPUT connections by default. You’re probably not an expert on iptables configuration (or you wouldn’t be here). There are two ways to get the settings you need:

Execute shell commands to iptables to set up the firewall
Import an iptables configuration from a dump file

There’s really not much difference, but this tutorial opted for the second option. Once you’ve got a default firewall set up to your liking, use iptables-save to dump out the rules to a file named /etc/iptables.uprules (naturally, you can use whatever file name you like; it just has to match the reference from the script below). If this is all very confusing, the values below set up a closed firewall for you, which is probably what you want.

*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -i eth0 -j ACCEPT
-A INPUT -i lo -j ACCEPT
COMMIT

Though FORWARD and OUTPUT are still accepted unconditionally, all requests to INPUT are dropped. The two rules for eth0 and lo make sure that the machine can communicate with itself. Now that you’ve got the rules you need, you want to somehow alter the default configuration of the firewall.

If you guessed that the next step is to edit /etc/iptables/default.conf or /etc/default/iptables.conf, you’d be wrong. That’s pretty intuitive, but wrong. On the latest versions of Ubuntu, networking setup like firewall configuration is best accomplished by adding a script that is executed just before the networking interface is established. This guarantees that the default firewall rules are in place before the network is in any way accessible. To do this, add a file called iptables.sh to the /etc/network/if-pre-up.d/ folder; Add the following lines to it:

#!/bin/sh

iptables-restore < /etc/iptables.uprules

exit 0

This is a super-simple script that loads the firewall configuration from the file you just created above. The iptables-restore command is convenient because it replaces the whole configuration, so you don’t have to do any resetting of your own.

Save the file and execute sudo chmod +x /etc/netwokr/if-pre-up.d/iptables.sh to make it executable. Restart networking by executing sudo /etc/init.d/networking restart.

A call to sudo iptables -nL should now elicit the following output (the main changes are highlighted):

Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0
ACCEPT     all  –  0.0.0.0/0            0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Congratulations! You’ve succeeded in locking out everybody again, but in a different way.

Separating the Wheat from the Chaff

How do you get back that coveted VIP status that you had just seconds ago? Now you’re up to step (2) above: “Determine the user connected through OpenVPN”. The basic strategy here is to key on the unique name in the SSL certificate authorized by OpenVPN. For each different group of permissions (IP addresses/ports/protocols) that you want to grant, create a file with the names of people who belong to that group, one name per line. For example:

Joe_Jackson
Phil_Hartman
Jill_Meikenson
Horst_Buchholz
Susan_B_Lazy

This is just one very simple solution to the problem of determining membership. Some installations with much larger user bases might want to instead bind to an external lookup using LDAP or an already existing MySQL database or something similar. That’s obviously beyond the scope of this tutorial, though.

You’re now going to need a script that will use these lists to determine which firewall rules to execute. I’ve added the general form of that script below, with matching for “employees” and “strangers” and TODO statements indicating where you need to extend the script for your own purposes:

function inlist
{
  less `dirname $0`/$1 | egrep "^${CLIENTCERT}$" > /dev/null
  if [ $? -eq 0 ]; then
    return 0
  else
    return 1
  fi
}

function get_next_matching_firewall_rule
{
  ip_address=$1

  RULE="`iptables -L INPUT -n –line-numbers | grep $ip_address | head -n 1`"
}

function drop_rule_from_iptables
{
  rule="$1"
  echo "  Drop rule [$rule]"
  line_number=`echo "$rule" | awk '{print $1}'`
  iptables -D INPUT $line_number
}

function add_port_to_iptables
{
  source_ip=$1
  destination_ip=$2
  protocol=$3
  port=$4

  iptables -A INPUT -i tun0 -s $source_ip -d $destination_ip -p $protocol –dport $port -j ACCEPT
}

function add_destination_to_iptables
{
  source_ip=$1
  destination_ip=$2
  iptables -A INPUT -i tun0 -s $source_ip -d $destination_ip -j ACCEPT
}

function open_firewall_for_strangers
{
  echo "  Add route for DNS"
  add_port_to_iptables $CLIENTIP 192.168.1.1 "UDP" 53

  echo "  Add route for Windows shares"
  add_port_to_iptables $CLIENTIP 192.168.1.5 "TCP" 139
  add_port_to_iptables $CLIENTIP 192.168.1.5 "TCP" 445

  return 0
}

function open_firewall_for_employees
{
  echo "  Add routes for all ip addresses"
  iptables -A INPUT -i tun0 -s $CLIENTIP -j ACCEPT
  return 0
}

function open_firewall
{
  echo "Opening firewall for $CLIENTCERT @ [$CLIENTIP]"
# TODO Add filtering for other lists, if desired
# inlist "MYGROUP.list"
#if [ $? -eq 0 ]; then
#  echo "  Certificate found in MYGROUP list"
#  open_firewall_for_MYGROUP
#  return 0
#else
  inlist "strangers.list"
  if [ $? -eq 0 ]; then
    echo "  Certificate found in strangers list"
    open_firewall_for_strangers
    return 0
  else
    inlist "employees.list"
    if [ $? -eq 0 ]; then
      echo "  Certificate found in employee list"
      open_firewall_for_employees
      return 0
    else
      echo "  Certificate not found in any list"
      return 1
    fi
  fi
}

function close_firewall
{
  echo "Closing firewall for [$CLIENTIP]"
  get_next_matching_firewall_rule $CLIENTIP

  while [ -n "$RULE" ]
  do
    drop_rule_from_iptables "$RULE"
    get_next_matching_firewall_rule $ip_address
  done
}

# Main

OPERATION=$1
CLIENTIP=$2
CLIENTCERT=$3

case "$1" in
  add)
    close_firewall
    open_firewall
    ;;

  update)
    close_firewall
    open_firewall
    ;;

  delete)
    close_firewall
    ;;
  *)
    echo "Unknown operation"
    exit 1
esac

exit $?

Some explanation for those who haven’t scripted in bash much before:

The whole script executes from the case statement at the end of the script. Note that in all recognized cases, the firewall is first closed just to make sure that there are no lingering entries for the given client’s IP address.
A call to close_firewall simply removes all rules for the given client’s IP address, in which case the default DROP action on INPUTS will block all incoming traffic from the address.
A call to open_firewall tries to find the user in one of the files. If successful, the rules for that file are applied to the firewall.
In this case, if the user is a “stranger”, they only have access the DNS server in the internal network (for name lookups) and to the Windows shares on one other machine.
If the user is an “employee”, then the script adds a rule to allow all ports and all protocols to all IP addresses for that person and restores full access rights.
Again, there are a dozen ways of determining membership and it’s doubtful that bash is the best language in which to program more complex membership tests. The best language to use is the one you know and the one that runs on your server. ;-)

Finally, you need to tell OpenVPN to run your script whenever it has authorized a connection. Execute sudo vi /etc/openvpn/server.conf and add or modify the following line:

learn-address /etc/openvpn/configfirewall.sh

Restart OpenVPN with sudo /etc/init.d/openvpn retart and you’re done! Your OpenVPN server now not only authorizes users but also locks down the firewall to allow only those services for which a user has permission.

Files

Finally, here are samples of all of the files used in this tutorial.

iptables.sh: The script to execute when just before the network starts
iptables.uprules: The default rules to apply to the firewall
configfirewall.sh: The firewall configuration executed when a user connects or disconnects from OpenVPN
employees.list: A list of employees against which to match

Quino 1.1.0.0 Released

2009-10-27T16:52:19+01:00

Published by marco on 27. Oct 2009 16:52:19 (GMT-5)

Updated by marco on 27. Oct 2009 16:52:51 (GMT-5)

Encodo Systems AG [1] has released Quino 1.1.0.0 to licensed customers; test licenses are available on request. Feel free to contact them at “info [at] encodo [dot] ch”. Read the Quino Fact Sheet for an in-depth overview.

Big, new features include:

Multi-language support
Remote method execution
Remote query execution
Full support for Microsoft SQL Server (including schema migration)
Improved support for object graphs in the ORM
Typed constants in the metadata
Beta support for ASP.NET MVC

More information is available at the Quino home page, including the original Metadata in Software Development paper as well as the aforementioned Quino Fact Sheet (excerpted below).

“What is Quino? Quino is a metadata framework written in C# 3.5. How does a metadata framework differ from an application framework? Application frameworks generally dictate much of the infrastructure of an application. An application can extend the framework only if it offers some way of extending it; if not, the application developer is often left without help at all. If a developer wants to extend or improve the user interface, for example, they have to work within the bounds defined by the user interface support provided by the application framework.

Quino, being a metadata framework, is different. The philosophy behind Quino is that metadata is great. Metadata enables generic programming and allows a developer to write or generate entire swathes of your application with very little effort—and great results—and therefore free up previous time to fine-tune the parts of the application that make it really stand out. It’s about spending time on the stuff that really matters instead of down in the trenches, connecting to databases, marshalling objects or painstakingly placing controls on forms or web pages.”

[1] Disclosure notice: I am a founder of Encodo as well as the lead developer of Quino.

Encodo C# Handbook 1.5.2 Released

2009-10-19T21:45:58+02:00

Published by marco on 19. Oct 2009 21:45:58 (GMT-5)

Version 1.5.2 of the Encodo C# Handbook is now available for download. It includes the following updates:

Expanded “8.1 – Documentation” with examples (this section is now four pages long)
Added more tips to the “2.3 – Interfaces vs. Abstract Classes” section
Added “7.20 – Restricting Access with Interfaces”
Added “5.3.7 – Extension Methods” and “7.17 – Using Extension Methods”

It’s also available for download at the MSDN Code Gallery.

Building pseudo-DSLs with C# 3.5

2009-10-18T15:59:43+02:00

Published by marco on 18. Oct 2009 15:59:43 (GMT-5)

Updated by marco on 19. Oct 2009 07:13:11 (GMT-5)

DSL is a buzzword that’s been around for a while and it stands for [D]omain-[Specific] [L]anguage. That is, some tasks or “domains” are better described with their own language rather than using the same language for everything. This gives a name to what is actually already a standard practice: every time a program assumes a particular format for an input string (e.g. CSV or configuration files), it is using a DSL. On the surface, it’s extremely logical to use a syntax and semantics most appropriate to the task at hand; it would be hard to argue with that. However, that’s assuming that there are no hidden downsides.

DSL Drawbacks

And the downsides are not inconsequential. As an example, let’s look at the DSL “Linq”, which arrived with C# 3.5. What’s the problem with Linq? Well, nothing, actually, but only because a lot of work went into avoiding the drawbacks of DSLs. Linq was written by Microsoft and they shipped it at the same time as they shipped a new IDE—Visual Studio 2008—which basically upgraded Visual Studio 2005 in order to support Linq. All of the tools to which .NET developers have become accustomed worked seamlessly with Linq.

However, it took a little while before JetBrains released a version of ReSharper that understood Linq…and that right there is the nub of the problem. Developer tools need to understand a DSL or you might as well just write it in Notepad. [1] The bar for integration into an IDE is quite high: developers expect a lot these days, including:

The DSL must include a useful parser that pinpoints problems exactly.
The DSL syntax must be clear and must support everything a developer may possibly want to do with it. [2]
The DSL must support code-completion.
ReSharper should also work with the DSL, if possible.
And so on…

What sounds, on the surface, like a slam-dunk of an idea, suddenly sounds like a helluva lot more work than just defining a little language [3]. That’s why Encodo decided early on to just use C# for everything in its Quino framework, wherever possible. The main part of a Quino application is its metadata, or the model definition. However, instead of coming up with a language for defining the metadata, Encodo lets the developer define the metadata using a .NET-API, which gives that developer the full power of code-completion, ReSharper and whatever other goodies they may have installed to help them get their jobs done.

Designing a C#-based DSL

Deciding to use C# for APIs doesn’t mean, however, that your job is done quickly: you still have to design an API that not only works, but is intuitive enough to let developers use it with as little error and confusion as possible.

I recently extended the API for building metadata to include being able to group other metadata into hierarchies called “layouts”. Though the API is implementation-agnostic, its primary use will initially be to determine how the properties of a meta-class are laid out in a form. That is, most applications will want to have more control over the appearance than simply displaying the properties of a meta-class in a form from first-to-last, one to a line.

In the metadata itself, a layout is a group of other elements; an element can be a meta-property or another group. A group can have a caption. Essentially, it should look like this when displayed (groups are surrounded by []; elements with <>):

[MainTab]
-----------------------------------
|  
|  [MainFieldSet]
|  --------------------------------
|  |  
|  |  [   ]
|  |  
|  |  
|  --------------------------------
|  [   ]
-----------------------------------

From the example above, we can extract the following requirements:

Groups can be nested.
Groups can have captions, but a caption is not required.
An element can be an anonymous group, a named group or an individual metadata element.

Design Considerations

One way of constructing this in a traditional programming language like C# is to create a new group when needed, using a constructor with a caption or not, as needed. However, I also wanted to make a DSL, which has as little cruft as possible; that is, I wanted to avoid redundant parameters and unnecessary constructors. I also wanted to avoid forcing the developer to provide direct references to meta-property elements where it would be more comfortable to just use the name of the property instead.

To that end, I decided to avoid making the developer create or necessarily provide the actual destination objects (i.e. the groups and elements); instead, I would build a parallel set of throwaway objects that the developer would either implicitly or explicitly create. The back-end could then use those objects to resolve references to elements and create the target object-graph with proper error-checking and so on. This approach also avoids getting the target metadata “dirty” with properties or methods that are only needed during this particular style of construction.

Defining the Goal

I started by writing some code in C# that I thought was both concise enough and offered visual hints to indicate what was being built. That is, I used whitespace to indicate grouping of elements, exactly as in the diagram from the requirements above.

Here’s a simple example, with very little grouping:

builder.AddLayout(
  personClass, "Basic", 
  Person.Relations.Contact,
  new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName),
  Person.Fields.Picture,
  Person.Fields.Birthdate
  new LayoutGroup(Person.Fields.IsEmployee, Person.Fields.Active)
);

The code above creates a new “layout” for the class personClass named “Details”. That takes care of the first two parameters; the much larger final parameter is an open array of elements. These are primarily the names of properties to include from personClass (or they could also be the properties themselves). In order to indicate that two properties are on the same line, the developer must group them using a LayoutGroup object.

Here’s a more complex sample, with nested groups (this one corresponds to the original requirement from above):

builder.AddLayout(
  personClass, "Details", 
  new LayoutGroup("MainTab",
    Person.Relations.Company,
    new LayoutGroup("MainFieldSet",
      Person.Relations.Contact,
      new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName),
      Person.Fields.Picture,
      Person.Fields.Birthdate
    ),
    new LayoutGroup(Person.Fields.IsEmployee, Person.Fields.Active)
  )
);

In this example, we see that the developer can also use a LayoutGroup to attach a caption to a group of other items, but that otherwise everything pretty much stays the same as in the simpler example.

Finally, a developer should also be able to refer to other layout definitions in order to avoid repeating code (adhering to the D.R.Y. principle [4]). Here’s the previous example redefined using a reference to another layout (highlighted):

builder.AddLayout(
  personClass, "Basic", 
  Person.Relations.Contact,
  new LayoutGroup(Person.Fields.FirstName, Person.Fields.LastName),
  Person.Fields.Picture,
  Person.Fields.Birthdate
);

builder.AddLayout(
  personClass, "Details", 
  new LayoutGroup("MainTab",
    Person.Relations.Company,
    new LayoutGroup("MainFieldSet",
      new LayoutReference("Basic");
    )),
    new LayoutItems(Person.Fields.IsEmployee, Person.Fields.Active)
  ))
);

Implementation

Now that I had an API I thought was good enough to use, I had to figure out how to get the C# compiler to not only accept it, but also to give me the opportunity to build the actual target metadata I wanted.

The trick ended up being to define a few objects for the different possibilities—groups, elements, references, etc.—and make them implicitly convert to a basic LayoutItem. Using implicit operators allowed me to even convert strings to meta-property references, like this:

public static implicit operator LayoutItem(string identifier)
{
  return new LayoutItem(identifier);
}

Each of these items has a reference to each possible type of data and a flag to indicate which of these data are valid and can be extracted from this item. The builder receives a list of such items, each of which may have a sub-list of other items. Processing the list is now as simple as iterating them with foreach, something like this:

private void ProcessItems(IMetaGroup group, IMetaClass metaClass, LayoutItem[] items)
{
  foreach (var item in items)
  {
    if (!String.IsNullOrEmpty(item.Identifier))
    {
      var element = metaClass.Properties[item.Identifier];
      group.Elements.Add(element);
    }
    else if (item.Items != null)
    {
      var subGroup = CreateNextSubGroup(group);
      group.Elements.Add(subGroup);
      ProcessItems(subGroup, metaClass, item.Items.Items);
    }
    else if (item.Group != null)
    {
      …
    }
    else (…)
  }
}

If the item was created from a string, the builder looks up the property to which it refers in the meta-class and add that to the current group. If the item corresponds to an anonymous group, the builder creates a new group and calls adds the items to it recursively. Here we can see how this solution spares the application developer the work of looking up each and every referenced property in application code. Instead, the developer’s code stays clean and short.

Naturally, my solution has many more cases but the sample above should suffice to show how the full solution works.

Cleaning it up

The story didn’t just end there, as there are limitations to forcing C# to doing everything we’d like. The primary problem came from distinguishing between the string that is the caption from strings that are references to meta-properties. To avoid this problem, I was forced to introduce a LayoutItems class for anonymous groups and reserve the LayoutGroup for groups with captions.

I was not able to get the implementation to support my requirements exactly as I’d designed them, but it ended up being pretty close. Below is the first example from the requirements, but changed to accommodate the final API; all changes are highlighted.

builder.AddLayout(
  personClass, "Details", 
  new LayoutGroup("MainTab", new LayoutItems(
    Person.Relations.Company,
    new LayoutGroup("MainFieldSet", new LayoutItems(
      Person.Relations.Contact,
      new LayoutItems(Person.Fields.FirstName, Person.Fields.LastName),
      Person.Fields.Picture,
      Person.Fields.Birthdate
    )),
    new LayoutItems(Person.Fields.IsEmployee, Person.Fields.Active)
  ))
);

All in all, I’m pretty happy with how things turned out: the API is clear enough that the developer should be able to both visually debug the layouts and easily adjust them to accommodate changes. For example, it’s quite obvious how to add a new property to a group, move a property to another line or put several properties on the same line. Defining this pseudo-DSL in C# lets the developer use code-completion, popup documentation and the full power of ReSharper and frees me from having to either write or maintain a parser or development tools for a DSL.

[1] On a side note, Encodo recently looked into the Spark View Engine for .NET MVC. Though we decided not to use it because we don’t really need it yet, we were also concerned that it has only nascent support for code-completion and ReSharper in its view-definition language.

[2] Even Linq has its limitations, of course, notably when using together with Linq-to-Entities in the Entity Framework. One obvious limitation in the first version is that “Contains” or “In” are not directly supported, requiring the developer to revert to yet another DSL, ESQL (Entity-SQL).

[3] Before getting the moniker “DSL”, the literature referred to such languages as “little languages”.

[4] D.R.Y. = [D]on’t [R]epeat [Y]ourself.

Designing a small API: Bit manipulation in C#

2009-10-18T13:24:58+02:00

Published by marco on 18. Oct 2009 13:24:58 (GMT-5)

A usable API doesn’t usually spring forth in its entirety on the first try. A good, usable API generally arises iteratively, improving over time. Naturally, when using words like good and usable, I’m obliged to define what exactly I mean by that. Here are the guidelines I use when designing an API, in decreasing order of importance:

Static typing & Compile-time Errors: Wherever possible, make the compiler stop the user from doing something incorrectly instead of letting the runtime handle it.
Integrates into standard practices: That is, do not invent whole new ways of doing things; instead, reuse or build on the paradigms already present in the language.
Elegance: Ideally, using the API should be intuitive, read like natural language and not involve a bunch of syntactic tricks or hard-to-remember formulations or parameter lists.
Clean Implementation: The internals should be as generalized and understandable as possible and involve as little repetition as possible.
CLS-Compliance: Cross-language compliance is also interesting and easily achieved for all but the most low-level of APIs

Using those guidelines, I designed an API to manage bits and sets of bits in C#. Having spent a lot of time using Delphi Pascal, I’d become accustomed to set and bit operations with static typing. In C#, the .Net framework provides the Set generic type, but that seems like overkill when the whole idea behind using bits is to use less space. That means using enumerated types and the FlagsAttribute; however, there are some drawbacks to using the native bit-operations directly in code:

Bit-manipulation is more low-level than most of the rest of the coding a C#-developer typically does. That, combined with doing it only rarely, makes direct testing of bits error-prone.
The syntax for testing, setting and removing bits is heavy with special symbols and duplicated identifiers.

To demonstrate, here is a sample:

[Flags]
enum TestValues
{
  None = 0,
  One = 1,
  Two = 2,
  Three = 4,
  Four = 8,
  All = 15,
}

// Set bits one and two:
var bitsOneAndTwo = TestValues.One | TestValues.Two;

// Remove bit two :
var bitOneOnly = bitsOneAndTwo & ~TestValues.Two;

// Testing for bit two:
if ((bitsOneAndTwo & TestValues.Two) == TestValues.Two)
{
  …
}

As you can see in the example above, setting a bit is reasonably intuitive (though it’s understandable to get confused about using | instead of & to combine bits). Removing a bit is more esoteric, as the combination of & with the ~ (inverse) operator is easily forgotten if not often used. Testing for a bit is quite verbose and extending to testing for one of several flags even more so.

Version One

Therefore, to make things easier, I decided to make some extension methods for these various functions and ended up with something like the following:

public static void Include(this T flags, T value) { … }
public static void Exclude(this T flags, T value) { … }
public static bool In(this T flags, T value) { … }
public static void ForEachFlag(this T flags, Action action) { … }

These definitions compiled and worked as expected, but had the following major drawbacks:

At the time, we were only using them with enum values, but code completion was offering the methods for all objects because there was no generic constraint on T.
Not only that, but much of the bit-manipulation code needed to know the base type of the arguments in order to be able to cast it to and from the correct types. There were a lot of checks, but it all happened at runtime.
The ForEachFlag() function was implemented as a lambda when it is clearly an iteration. Using a lambda instead makes it impossible to use break or continue with this method.

This version, although it worked, broke several of the rules outline above; namely: while it did offer compile-time checking, the implementation had a lot of repetition in it and the iteration did not make use of the common library enumeration support (IEnumerable and foreach). That the operations were available for all objects and polluted code-completion only added insult to injury.

Version Two

A natural solution to the namespace-pollution problem is to add a generic constraint to the methods, restricting the operations to objects of type Enum, as follows:

public static void Include(this T flags, T value)
  where T : Enum
{ … }

public static void Exclude(this T flags, T value)
  where T : Enum
{ … }

public static bool In(this T flags, T value)
  where T : Enum
{ … }

public static void ForEachFlag(this T flags, Action action)
  where T : Enum
{ … }

.NET enum-declarations, however, do not inherit from Enum; instead, they inherit from Int32, by default, but can also inherit from a handful of other base types (e.g. byte, Int16). This makes sense so that enum-values can be freely converted to and from these base values. Not only will a generic constraint as defined above not have the intended effect, it’s explicitly disallowed by the compiler. So, that’s a dead-end.

The other, more obvious way of restricting the target type of an extension method is to change the type of the first parameter from T to something else. However, since enum types don’t inherit from Enum, what type do we use? Well, it turns out that Enum is a strange type, indeed. It can’t be used in a generic constraint and does not serve as the base class for enumerated types but, when used as the target of an extension method, it magically applies to all enumerated types!

I took advantage of this loophole to build the next version of the API, as follows:

public static void Include(this Enum flags, T value) { … }
public static void Exclude(this Enum flags, T value) { … }
public static bool In(this T flags, Enum value) { … }
public static void ForEachFlag(this Enum flags, Action action) { … }

This version had two advantages over the first version:

The methods are only available for enumerated types instead of for all types, which cleans up the code-completion pollution.
The implementation could take advantage of the Enum.GetTypeCode() method instead of the is and as-operators to figure out the type and cast the input accordingly.

Version Three

After using this version for a little while, it became obvious that there were still problems with the implementation:

Though using Enum as the target type of the extension method was a clever solution, it turns out to be a huge violation of the first design-principle outlined above: The type T for the other parameters is not guaranteed to conform to Enum. That is, the compiler cannot statically verify that the bit being checked (value) is of the same type as the bit-set (flags).
The solution only works with Enum objects, where it would also be appropriate for Int32, Int64 objects and so on.
The ForEach method still has the same problems it had in the first version; namely, that it doesn’t allow the use of break and continue and therefore violates the second design-principle above.

A little more investigation showed that the Enum.GetTypeCode() method is not unique to Enum but implements a method initially defined in the IConvertible interface. And, as luck would have it, this interface is implemented not only by the Enum class, but also by Int32, Int64 and all of the other types to which we would like to apply bit- and set-operations.

Knowing that, we can hope that the third time’s a charm and redesign the API once again, as follows:

public static void Include(this T flags, T value)
  where T : IConvertible
{ … }

public static void Exclude(this T flags, T value)
  where T : IConvertible
{ … }

public static bool In(this T flags, T value)
  where T : IConvertible
{ … }

public static void ForEachFlag(this T flags, Action action)
  where T : IConvertible
{ … }

Now we have methods that apply only to those types that support set- and bit-operations (more or less [1]). Not only that, but the value and action arguments are once again guaranteed to be statically compliant with the flags arguments.

With two of the drawbacks eliminated with one change, we converted the ForEachFlag method to return an IEnumerable instead, as follows:

public static IEnumerable GetEnabledFlags(this T flags)
  where T : IConvertible
{ … }

The result of this method can now be used with foreach and works with break and continue, as expected. Since the method also now applies to non-enumerated types, we had to re-implement it to return the set of possible bits for the type instead of simply iterating the possible enumerated values returned by Enum.GetValues(). [2]

This version satisfies the first design principles (statically-typed, standard practice, elegant) relatively well, but is still forced to make concessions in implementation and CLS-compliance. It turns out that the IConvertible interface is somehow not CLS-compliant, so I was forced to mark the whole class as non-compliant. On the implementation side, I was avoiding the rather clumsy is-operator by using the IConvertible.GetByteCode() method, but still had a lot of repeated code, as shown below in a sample from the implementation of Is:

switch (flags.GetTypeCode())
{
  case TypeCode.Byte:
    return (byte)(object)flags == (byte)(object)value;
  case TypeCode.Int32:
    return (int)(object)flags == (int)(object)value;
  …
}

Unfortunately, bit-testing is so low-level that there is no (obvious) way to refine this implementation further. In order to compare the two convertible values, the compiler must be told the exact base type to use, which requires an explicit cast for each supported type, as shown above. Luckily, this limitation is in the implementation, which affects the maintainer and not the user of the API.

Since implementing the third version of these “BitTools”, I’ve added support for Is (shown partially above), Has, HasOneOf and it looks like the third time might indeed be the charm, as the saying goes.

[1] The IConvertible interface is actually implemented by other types, to which our bit-operations don’t apply at all, like double, bool and so on. The .NET library doesn’t provide a more specific interface—like “INumeric” or “IIntegralType”—so we’re stuck constraining to IConvertible instead.

[2]

Which, coincidentally, fixed a bug in the first and second versions that had returned all detected enumerated values—including combinations—instead of individual bits. For example, given the type shown below, we only ever expect values One and Two, and never None, OneAndTwo or All.

[Flags]
enum TestValues
{
  None = 0,
  One = 1,
  Two = 2,
  OneOrTwo = 3,
  All = 3,
}

That is, foreach (Two.GetEnabledFlags()) { … } should return only Two and foreach (All.GetEnabledFlags()) { … } should return One and Two.

Waiting for C# 4.0: A casting problem in C# 3.5

2009-10-17T22:56:11+02:00

Published by marco on 17. Oct 2009 22:56:11 (GMT-5)

Updated by marco on 26. Oct 2021 12:17:00 (GMT-5)

C# 3.5 has a limitation where generic classes don’t necessarily conform to each other in the way that one would expect. This problem manifests itself classically in the following way:

class D { }
class E : D { }
class F : D { }

class Program
{
  void ProcessListOfD(IList list) { }
  void ProcessListOfE(IList list) { }
  void ProcessSequenceOfD(IEnumerable sequence) { }
  void ProcessSequenceOfE(IEnumerable sequence) { }

  void Main()
  {
    var eList = new List();
    var dList = new List();

    ProcessListOfD(dList); // OK
    ProcessListOfE(dList); // Compiler error, as expected
    ProcessSequenceOfD(dList); // OK
    ProcessSequenceOfE(dList); // Compiler error, as expected

    ProcessListOfD(eList); // Compiler error, unexpected!
    ProcessListOfE(eList); // OK
    ProcessSequenceOfD(eList); // Compiler error, unexpected!
    ProcessSequenceOfE(eList); // OK
  }
}

Why are those two compiler errors unexpected? Why shouldn’t a program be able to provide an IList where an IList is expected? Well, that’s where things get a little bit complicated. Whereas at first, it seems that there’s no down side to allowing the assignment—E can do everything expected of D, after all—further investigation reveals a potential source of runtime errors.

Expanding on the example above, suppose ProcessListOfD() were to have the following implementation:

void ProcessListOfD(IList list)
{
  if (SomeCondition(list))
  {
    list.Add(new F());
  }
}

With such an implementation, the call to ProcessListOfD(bList), which passes an IList would cause a runtime error if SomeCondition() were to return true. So, the dilemma is that allowing co- and contravariance may result in runtime errors.

A language design includes a balance of features that permit good expressiveness while restricting bad expressiveness. C# has implicit conversions, but requires potentially dangerous conversions to be made explicit with casts. Similarly, the obvious type-compatibility outlined in the first example is forbidden and requires a call to the System.Linq.Enumerable.Cast(this IEnumerable) method instead. Other languages—most notably Eiffel—have always allowed the logical conformance between generic types, at the risk of runtime errors. [1]

Some of these limitations will be addressed in C# 4.0 with the introduction of covariance. See Covariance and Contravariance (C# and Visual Basic) (MSDN) and LINQ Farm: Covariance and Contravariance in C# 4.0 for more information.

A (Partial) Solution for C# 3.5

Until then, there’s the aforementioned System.Linq.Enumerable.Cast(this IEnumerable) method in the system library. However, that method, while very convenient, makes no effort to statically verify that the input and output types are compatible with one another. That is, a call such as the following is perfectly legal:

var numbers = new [] { 1, 2, 3, 4, 5 };
var objects = numbers.Cast< object>(); // OK
var strings = numbers.Cast< string>(); // runtime error!

Instead of an unchecked cast, a method with a generic constraint on the input and output types would be much more appropriate in those situations where the program is simply avoiding the generic-typing limitation described in detail in the first section. The method below does the trick:

public static IEnumerable Convert(this IEnumerable input)
  where TInput : TOutput
{
  if (input == null) { throw new ArgumentNullException("input"); }

  if (input is IList) { return (IList)input; }

  return input.Select(obj => (TOutput)(object)obj);
}

While it’s entirely possible that the Cast() function from the Linq library is more highly optimized, it’s not as safe as the method above. A check with Redgate’s Reflector would probably reveal just how that method actually works. Correctness come before performance, but YMMV. [2]

The initial examples can now be rewritten to compile without casting:

ProcessListOfD(eList.Convert()); // OK
ProcessListOfE(eList); // OK
ProcessSequenceOfD(bList.Convert()); // OK
ProcessSequenceOfE(eList); // OK

One More Little Snag

Unlike the Enumerable.Cast() method, which has no restrictions and can be used on any IEnumerable, there will be places where the compiler will not allow an application to use Convert(). This is because the generic constraint to which TOutput must conform (TInput) is, in some cases, not statically provable (i.e. at compile-time). A concrete example is shown below:

abstract class A
{
  abstract IList GetObject();
}

class B : A
{
  public override IList GetObject() 
  {
    return _objects.Convert(); // Compile error!
  }

  private IList _objects;
}

The example above does not compile because TResult does not provably conform to T. A generic constraint on TResult cannot be applied because it would have to be applied to the original, abstract function, which knows nothing of T. In these cases, the application will be forced to use the System.Linq.Enumerable.Cast(this IEnumerable) instead.

[1] I’ve addressed this issue before in Static-typing for languages with covariant parameters, which reviewed the paper, Type-safe covariance: Competent compilers can catch all catcalls, a proposal for statically identifying potential runtime errors and requiring them to be addressed with a recast definition. Similarly, another runtime plague—null-references—is also addressed in Eiffel, a feature extensively documented in the paper, Attached types and their application to three open problems of object-oriented programming.

[2] YMMV = “Your Mileage May Vary”, but remember, Donald Knuth famously said: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”

Creating fluent interfaces with inheritance in C#

2009-10-15T23:18:21+02:00

Published by marco on 15. Oct 2009 23:18:21 (GMT-5)

Updated by marco on 16. Oct 2009 09:35:31 (GMT-5)

Fluent interfaces—or “method chaining” as it’s also called—provide an elegant API for configuring objects. For example, the Quino query API provides methods to restrict (Where or WhereEquals), order (OrderBy), join (Join) and project (Select) data. The first version of this API was very traditional and applications typically contained code like the following:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller");
query.WhereEquals(Person.Fields.FirstName, "Hans");
query.OrderBy(Person.Fields.LastName, SortDirection.Ascending);
query.OrderBy(Person.Fields.FirstName, SortDirection.Ascending);
var contactsTable = query.Join(Person.Relations.ContactInfo);
contactsTable.Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse");

(This example gets all people named “Hans Müller” that live on a street with a name that ends in “Strasse” (case-insensitive) sorted by last name, then first name. Fields and Relations refer to constants generated from the Quino metadata model.)

Fluent Examples

The syntax above is very declarative and relatively easy-to-follow, but is a bit wordy. It would be nice to be able to chain together all of these calls and remove the repeated references to query. The local variable contactsTable also seems kind of superfluous here (it is only used once).

A fluent version of the query definition looks like this:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller")
  .WhereEquals(Person.Fields.FirstName, "Hans")
  .OrderBy(Person.Fields.LastName, SortDirection.Ascending)
  .OrderBy(Person.Fields.FirstName, SortDirection.Ascending)
  .Join(Person.Relations.ContactInfo)
    .Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse");

The example uses indenting to indicate that restriction after the join on the “ContactInfo” table applies to the “ContactInfo” table instead of to the “Person” table. The call to Join logically returns a reference to the joined table instead of the query itself. However, each such table also has a Query property that refers to the original query. Applications can use this to “jump” back up and apply more joins, as shown in the example below where the query only returns a person if he or she also works in the London office:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller")
  .WhereEquals(Person.Fields.FirstName, "Hans")
  .OrderBy(Person.Fields.LastName, SortDirection.Ascending)
  .OrderBy(Person.Fields.FirstName, SortDirection.Ascending)
  .Join(Person.Relations.ContactInfo)
    .Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse").Query
  .Join(Person.Relations.Office)
    .WhereEquals(Office.Fields.Name, "London");

A final example shows how even complex queries over multiple table levels can be chained together into one single call. The following example joins on the “ContactInfo” table to dig even deeper into the data by restricting to people whose web sites are owned by people with at least 10 years of experience:

var query = new Query(Person.Metadata);
query.WhereEquals(Person.Fields.Name, "Müller")
  .WhereEquals(Person.Fields.FirstName, "Hans")
  .OrderBy(Person.Fields.LastName, SortDirection.Ascending)
  .OrderBy(Person.Fields.FirstName, SortDirection.Ascending)
  .Join(Person.Relations.ContactInfo)
    .Where(ContactInfo.Fields.Street, ExpressionOperator.EndsWithCI, "Strasse")
    .Join(ContactInfo.Relations.WebSite)
      .Join(WebSite.Relations.Owner)
        .Where(Owner.Fields.YearsExperience, ExpressionOperator.GreaterThan, 10).Query
  .Join(Person.Relations.Office)
    .WhereEquals(Office.Fields.Name, "London");

This API might still be a bit too wordy for some (.NET 3.5 Linq would be less wordy), but it’s refactoring-friendly and it’s crystal-clear what’s going on.

Implementation

When there’s only one class involved, it’s not that hard to conceive of how this API is implemented: each method just returns a reference to this when it has finished modifying the query. For example, the WhereEquals method would look like this:

IQuery WhereEquals(IMetaProperty prop, object value);
{
  Where(CreateExpression(prop, value);

  return this;
}

This isn’t rocket science and the job is quickly done.

However, what if things in the inheritance hierarchy aren’t that simple? What if, for reasons known to the Quino framework architects, IQuery actually inherits from IQueryCondition, which defines all of the restriction and ordering operations. The IQuery provides projection and joining operations, which can easily just return this, but what type should the operations in IQueryCondition return?

The problem area is indicated with question marks in the example below:

public interface IQueryCondition
{
  ??? WhereEquals(IMetaProperty prop, object value);
}

public interface IQueryTable : IQueryCondition
{
  IQueryTable Join(IMetaRelation relation);
}

public interface IQuery : IQueryTable
{
  IQueryTable SelectDefaultForAllTables();
}

The IQueryCondition can’t simply return IQueryTable because it might be used elsewhere [1], but it can’t return IQueryCondition because then the table couldn’t perform a join after a restriction because applying the restriction would have restricted the fluent interface to an IQueryCondition instead of an IQueryTable.

The solution is to make IQueryCondition generic and pass it the type that it should return instead of hard-coding it.

public interface IQueryCondition
{
  TSelf WhereEquals(IMetaProperty prop, object value);
}

public interface IQueryTable : IQueryCondition
{
  IQueryTable Join(IMetaRelation relation);
}

public interface IQuery : IQueryTable
{
  IQueryTable SelectDefaultForAllTables();
}

That takes care of the interfaces, on to the implementation. The standard implementation runs into a small problem when returning the generic type:

public class QueryCondition : IQueryCondition
{
  TSelf WhereEquals(IMetaProperty prop, object value)
  {
    // Apply restriction

    return (TSelf)this; // causes a compile error
  }
}

public class QueryTable : QueryCondition, IQueryTable
{
  IQueryTable Join(IMetaRelation relation) 
  {
    // Perform the join

    return result;
  }
}

public class Query : IQuery
{
  IQueryTable SelectDefaultForAllTables()
  {
    // Perform the select

    return this;
  }
}

One simple solution to the problem is to cast down to object and back up to TSelf, but this is pretty bad practice as it short-circuits the static checker in the compiler and defers the problem to a potential runtime one.

public class QueryCondition : IQueryCondition
{
  TSelf WhereEquals(IMetaProperty prop, object value)
  {
    // Apply restriction

    return (TSelf)(object)this;
  }
}

In this case, it’s guaranteed by the implementation that this is compliant with TSelf, but it would be even better to solve the problem without resorting to the double-cast above. As it turns out, there is a simple and quite elegant solution, using an abstract method called ThisAsTSelf, as illustrated below:

public abstract class QueryCondition : IQueryCondition
{
  TSelf WhereEquals(IMetaProperty prop, object value)
  {
    // Apply restriction

    return ThisAsTSelf();
  }

  protected abstract TSelf ThisAsTSelf();
}

public class Query : IQuery
{
  protected override TSelf ThisAsTSelf()
  {
    return this;
  }
}

The compiler is now happy without a single cast at all because Query returns this, which the compiler knows conforms to TSelf. The power of a fluent API is now at your disposal without restricting inheritance hierarchies or making end-runs around the compiler. Naturally, the concept extends to multiple levels of inheritance (e.g. if all calls had to return IQuery instead of IQueryTable), but it gets much uglier, as it requires nested generic types in the return types, which makes it much more difficult to understand. With a single level, as in the example above, the complexity is still relatively low and the resulting API is very powerful.

[1] And, in Quino, it is used elsewhere, for the IQueryJoinCondition.

Supporting Data-entry in Database Applications

2009-06-26T00:04:09+02:00

Published by marco on 26. Jun 2009 00:04:09 (GMT-5)

Updated by marco on 26. Jun 2009 08:51:26 (GMT-5)

The following is an analysis and brainstorming of a problem in generalized database browser GUIs, like those generated by the Quino metadata framework.

The User Story

Let’s start with the user story that generated this idea:

“A user was entering data using our database software and complained of losing data. After verifying that the lost data was not due to an obvious software bug, we determined that it was because of how she was assuming the software worked. That is, she would use the application to browse to the location where she wanted to add data, create a new object, fill out its fields, then save it. For each subsequent object, she simply filled out the form again and clicked save to save it.”

The Problem and a Quick-fix

Now, if you know the paradigm of Quino applications – and, indeed, most modern GUI applications, you’re going to see the problem: instead of creating a new entry every time she clicked save, she was simply saving the current object, then editing that same object, then overwriting the previous changes. After filling out the details for dozens of objects, she had only one object saved in the database.

One fix is to improve her training so that she knows how to create multiple objects with a Quino application. That is what we did, so that she could continue with data-entry and get her work done with the current software. A better workflow for entering new records is to select “new”, fill in the data, then select “new” again to store the existing entry and create a blank form for the next entry. A few problems are immediately obvious:

It’s not very intuitive to select “new” to indicate that one is finished with editing an object.
The data you’ve entered for one object may have quite an overlap with the data you’re going to enter for the next object – but you have to enter the data all over again (the “templates” section below addresses this issue).
The user isn’t given any visual or workflow feedback that editing an existing object is any different than adding a new one. As far as the underlying technology is concerned, this is true; as far as the user is concerned, this is not.

Anyway, once we’d gotten her squared away, we huddled back at Encodo headquarters and asked ourselves how we could avoid similar problems in the future. We agreed that it was a difficult problem and had to break up after a bit to attend to more pressing matters. The problem continued to swirl around in our collective subconscious, though.

From Another Angle

In describing the problem to another, non-technical person with a fair amount of computer experience, the following ideas came up.

The pattern the user seemed to be following was the classic dialog-based workflow, where you open a dialog, make some changes, then click “Ok” to save changes. Since there was no Ok button, the user clicked “Save” to continue.
Quino applications now save changes automatically. Previous versions used the classic pattern of asking whether changes should be saved or discarded before allowing the user to move the focus away from the edited data. Users didn’t like this because it got in the way far more than it prevented them from saving unwanted changes.
The application is a data-browser which shows the details for individual objects in an editable format. This is highly convenient when browsing the data and making edits here and there, but not ideal when making doing mass data-entry (i.e. when adding data).
That Quino displays new objects in the same place as existing ones without making a distinction is an accident waiting to happen. With auto-save mode, it’s not even obvious how you can tell the application that you’re done creating the new entry. The answer now is to either select “save” explicitly or to just click something else, like another object or the “new” button.
The mix of the highly generalized data browser, the object editor and the object creator was too subtle to be learned intuitively, leading to disastrous workflows like the one described in the user story above.

Possible Solutions

Templates

As mentioned above, when we create a new object, the data entry form is empty, save for a few default values (set from the model) and the parent object on which the object is being created. However, the user might want to do one of several other things when creating a new object:

Standard: have a blank form as we offer now, in order to be able to specify every facet individually
Cloning: pre-load the form for the new object using all of the values from another object
Templating: pre-load the form for the new object using some of the values from another object

Handling New Objects

In the case of “Cloning” and “Templating”, we run into the danger that cropped in the user story above; namely, that the form is in the same place as the object being cloned or the last object displayed, but it is now showing either an exact copy or a partially filled object instead. An object that is new and unsaved. How can we let the user know that this object is new and unsaved and, conversely, how can we let the user know that when they are making edits to an existing object, they are not saving a new object, but modifying data, which included replacing existing information with new information.

One way to handle this problem is to leave the GUI as it is, but to use color or decal hinting to let the user know the object state. We could do this in several ways:

Add a banner to the top of the form editor, indicating the current mode with text and an icon (e.g. “Editing existing object” or “Creating new object”)
Use a different background color for fields that have been modified from their loaded values so that the user can tell that they are being changed. This offers a hint that the object being edited is not a new one.
Use a different background color for new and edit forms.
Some combination of the features above or something else entirely.

Another way to handle the problem is to separate the tasks of editing existing objects and creating new ones. A paradigm to which users are well-accustomed is the dialog box “Ok/Cancel” one. Open a dialog, fill in the data and click ok to save it or cancel to abort. The way the Quino data browser works right now is that Ok and Cancel manifest as “Save” and “Revert” in the toolbar (which is not so clearly connected to the object being edited or created). This is not really that intuitive, especially when considering that editible objects can be nested.

The navigate to an object and edit-in-place concept is a good one and one which seems to cause little trouble with users. What if, however, we were to change the data entry mode to use a separate window instead? Instead of simply loading the new, empty object into the panel where existing objects are edited, we open a modal dialog showing the new, empty object instead. There is little room for error as the user must select “Ok” or “Cancel” to exit the dialog, making an explicit choice to save or discard the new object. The dialog cannot be closed with “Ok” unless the object validates successfully. When the object is saved and the dialog closes, the form from which the dialog was opened is focused on the new object, which appears in the browser in the tree(s) and/or list(s) where it belongs.

The feature above is quite a bare-bones approach and we can do much better. For example, we could offer the following improvements:

Instead of closing the dialog when the user selects “Ok”, Quino saves the object, but loads the form with a new, blank object so the user can continue to add objects.
When an object has been added, but the dialog is not closed, Quino could show a banner at the top or bottom, indicating what just happened (e.g. “Object [title] was saved; creating new object.”). This little message could fade out after a couple of seconds.
Instead of simply loading a blank object, the user could select which template should be loaded – or which object should be cloned – to pre-fill the form for new objects in this data-entry session.
When the user is working in this “batch data-entry mode”, the “Ok” button might be renamed to “Save” and the cancel button might be renamed to “Close”.
The user can select at the top of the new-entry form how the form should be used: “single object” or “batch mode”.
In either mode, the user should be able to set a preference for which template is used by default. These preferences should be available from the new-entry form.
The form should always show where in the hierarchy the new object will be placed. This could be displayed as a series of breadcrumbs or in a tree-view or some other cool GUI gadget.
If validation errors occur, the form does not close in either mode (single or batch).

It seems that, with such an approach, Quino would offer a much more streamlined and intuitive method of mass or single data entry with far less of a chance of users getting confused by the combination of the global toolbar, auto-saving and the mix of browsing and data-entry.

Microsoft Code Contracts: Not with a Ten-foot Pole

2009-06-21T14:10:45+02:00

Published by marco on 21. Jun 2009 14:10:45 (GMT-5)

After what seems like an eternity, a mainstream programming language will finally dip its toe in the Design-by-contract (DBC) pool. DBC is a domain amply covered in one less well-known language called Eiffel (see ISE Eiffel Goes Open-Source for a good overview), where preconditions, postconditions and invariants of various stripes have been available for over twenty years.

Why Contracts?

Object-oriented languages already include contracts; “classic” signature-checking involves verification of parameter counts and type-conformance. DBC generally means extending this mechanism to include assertions on a higher semantic level. A method’s signature describes the obligations calling code must fulfill in order to execute the method. The degree of enforcement varies from language to language. Statically-typed languages verify types according to conformance at compile-time, whereas dynamically-typed languages do so at run-time. Even the level of conformance-checking differs from language to language, with statically-typed languages requiring hierarchical conformance via ancestors and dynamically-typed languages verifying signatures via duck-typing.

And that’s only for individual methods; methods are typically collected into classes that also have a semantic meaning. DBC is about being able to specify the semantics of a class (e.g. can property A ever be false when property B is true?) as well as those of method parameters (can parameter a ever be null?) using the same programming language.

Poor-man’s DBC

DBC is relatively tedious to employ without framework or language support. Generally, this takes the form of using Debug.Assert [1] at the start of a method call to verify arguments, throwing ArgumentExceptions when the caller did not satisfy the contract. Post-conditions can also be added in a similar fashion, at the end of the funtion. Naturally, without library support, post-conditions must be added before any return-statements or enclosed in an artificial finally-clause around the rest of the method body. Class invariants are even more tedious, as they must be checked both at the beginning and end of every single “entering” method call, where the “entering” method call is the first on the given object. A proper implementation may not check the invariant for method calls that an object calls on itself because its perfectly all right for an object to be in an invalid state until the “entering” method returns.

One assertion that arises quite often is that of requiring that a parameter be non-null in a precondition. An analysis of most code bases that used poor-man’s DBC will probably reveal that the majority of its assertions are of this form. Therefore, it would be nice to handle this class of assertion separately using a language feature that indicates that a particular type can statically never be null. Eiffel has added this support with a separate notation for denoting “attached” types (types that are guaranteed to be attached to a non-null reference). Inclusion of such a feature not only improves the so-called “provability” of programs written in that language, it also transforms null-checking contracts to another notation (e.g. in Eiffel, objects are no longer nullable by default and the ?-operator is used to denote nullability) and removes much of the clutter from the precondition block.

Without explicit language support, a DBC solution couched in terms of assertions and/or exceptions quickly leads to clutter that obscures the actual program logic. Contracts should be easily recognizable as such by both tools and humans. Ideally, the contract can be extracted and included in documentation and code completion tooltips. Eiffel provides such support with separate areas for pre- and post-conditions as well as class invariants. All assertions can be labeled to give them a human-readable name, like “param1_not_null” or “list_contains_at_most_one_element”. The Eiffel tools provide various views on the source code, including what they call the “short” view, showing method signatures and contracts without implementation, as well as the “short flat” view, which is the “short” view, but includes all inherited methods to present the full interface of a type.

Looking at “Code Contracts”

Other than Eiffel, no close-to-mainstream programming language [2] has attempted to make the implicit semantics of a class explicit with DBC. Until now. Code Contracts will be included in C# 4.0, which will be released with Visual Studio 2010. It is available today as a separate assembly and compatible with C# 3.5 and Visual Studio 2008, so no upgrade is required to start using it. Given the lack of an upgrade requirement, we can draw the conclusion that this contracting solution is library-only without any special language support.

That does not bode well; as mentioned above, such implementations will be limited in their support of proper DBC. The user documentation provides an extensive overview of the design and proper use of Code Contracts.

There are, as expected, no new keywords or language support for contracts in C# 4.0. That means that tools and programmers will have to rely on convention in order to extract semantic meaning from the contracts. Pre- and postconditions are mixed together at the top of the method call. Post-conditions have support for accessing the method result and original values of arguments. Contracts can refer to fields not visible to other classes and there is an attribute-based hack to make these fields visible via a proxy property.

Abstract classes and Interfaces

Contracts for abstract classes and interfaces are, simply put, a catastrophe. Since these constructs don’t have method implementations, they can’t contain contracts. Therefore, in order to attach contracts to these constructs—and, to be clear, the mechanism would be no improvement over the current poor-man’s DBC if there was no way to do this—there is a ContractClass attribute. Attaching contracts to an interface involves making a fake implementation of that interface, adding contracts there, hacking expected results so that it compiles, presumably adding a private constructor so it can’t be instantiated by accident, then referencing it from the interface via the attribute mentioned above. It works, but it’s far from pretty and it move the contracts far from the place where it would be intuitive to look for them.

No Support for Precondition Weakening

Just as the specification side is not so pretty, the execution side also suffers. Contracts are, at least, inherited, but preconditions cannot be weakened. That is, a sub-type—and implementations of interfaces with contracts are sub-types—cannot add preconditions; end of story. As soon as a type contains at least one contract on one method, all methods in that type without contracts are interpreted as specifying the “empty” contract.

Instead of simply acknowledging that precondition weakening could be a useful feature, the authors state:

“While we could allow a weaker precondition, we have found that the complications of doing so outweigh the benefits. We just haven’t seen any compelling examples where weakening the precondition is useful.”

Let’s have an example, where we want to extend an existing class with support for a fallback mechanism. In the following case we have a transmitter class that sends data over a server; the contracts require that the server be reachable before sending data. The descendant adds support for a second server over which to send, should the first be unreachable. All examples below have trimmed initialization code that guarantees non-null properties for clarity’s sake. All contracts are included.

class Transmitter
{
  public Server Server { get; }

  public virtual void SendData(Data data)
  {
     Contracts.Requires(data != null);
     Contracts.Requires(Server.IsReachable);
     Contracts.Ensures(data.State == DataState.Sent);

     Server.Send(data);
  }

  [ContractInvariantMethod]
  protected void ObjectInvariant
  {
    Contract.Invariant(Server != null);
  }
}

class TransmitterWithFallback : Transmitter
{
  public Server FallbackServer { get; }

  public override void SendData(Data data)
  {
     // *contract violation*

     // If "Server" is not reachable, we will never be given
     // the opportunity to send using the fallback server
  }

  [ContractInvariantMethod]
  protected void ObjectInvariant
  {
    Contract.Invariant(FallbackServer != null);
  }
}

We can’t actually implement the fallback without adjusting the original contracts. With access to the code for the base class, we could address this shortcoming by moving the check for server availability to a separate method, as follows:

class Transmitter
{
  public Server Server { get; }

  [Pure]
  public virtual bool ServerIsReachable 
  { 
    get { return Server.IsReachable; }
  }

  public virtual void SendData(Data data)
  {
     Contracts.Requires(data != null);
     Contracts.Requires(ServerIsReachable);
     Contracts.Ensures(data.State == DataState.Sent);

     Server.Send(data);
  }

  [ContractInvariantMethod]
  protected void ObjectInvariant
  {
    Contract.Invariant(Server != null);
  }
}

class TransmitterWithFallback : Transmitter
{
  public Server FallbackServer { get; }

  [Pure]
  public override bool ServerIsReachable 
  { 
    get { return Server.IsReachable || FallbackServer.IsReachable; }
  }

  public override void SendData(Data data)
  {
    if (Server.IsReachable)
    {
      base.SendData(data);
    }
    else
    {
      FallbackServer.Send(data);
    }
  }

  [ContractInvariantMethod]
  protected void ObjectInvariant
  {
    Contract.Invariant(FallbackServer != null);
  }
}

With careful planning in the class that introduces the first contract—where precondition contracts are required to go—we can get around the lack of extensibility of preconditions. Let’s take a look at how Eiffel would address this. In Eiffel, the example above would look something like the following [3]:

class TRANSMITTER
  feature
    server: SERVER

    send_data(data: DATA) is
    require
      server.reachable
    do
      server.send(data)
    ensure
      data.state = DATA_STATE.sent;
    end
end

class TRANSMITTER_WITH_FALLBACK
  inherits
    TRANSMITTER
      redefine
        send_data
      end
  feature
    fallback_server: SERVER

    send_data (data: DATA) is
      require else
        fallback_server.reachable
      do
        if server.reachable then
          Precursor;
        else
          fallback_server.send(data)
        end
      end
end

The Eiffel version has clearly separated boundaries between contract code and implementation code. It also did not require a change to the base implementation in order to implement a useful feature. The author of the library has that luxury, whereas users of the library would not and would be forced to use less elegant solutions.

To sum up, it seems that, once again, the feature designers have taken the way out that makes it easier on the compiler, framework and library authors rather than providing a full-featured design-by-contract implementation. It was the same with the initial generics implementation in C#, without co- or contra-variance. The justification at the time was also that “no one really needed it”. C# 4.0 will finally include this essential functionality, belying the original assertion.

Thumbs Up or Thumbs Down?

The implementation is so easy-to-use that even the documentation leads off by warning that:

“a word of caution: Static code checking or verification is a difficult endeavor. It requires a relatively large effort in terms of writing contracts, determining why a particular property cannot be proven, and finding a way to help the checker see the light. […] If you are still determined to go ahead with contracts […] To not get lost in a sea of warnings […] (emphasis added)”

Not only is that not ringing, that’s not even an endorsement.

Other notes on implementation include:

Testing frameworks require scaffolding to redirect contract exceptions to the framework instead of an assertion dialog.
There is no support for edit-and-continue in contracted assemblies. Period. Contracting injects code into assemblies during the compile process, which makes them unusable for the edit-and-continue debugger. [4]
Because of this instrumentation, expect medium to massive slowdowns during compilation; the authors recommend enabling contracts in a special build instead of in all DEBUG builds. This is a ridiculous restriction as null-checks and other preconditions are useful throughout the development process, not just for pre-release testing. Poor-man’s DBC is currently enabled in all builds; a move to MS Contracts with the recommended separate build would remove this support, weakening the development process.
Some generated code (e.g. Windows Forms code) currently causes spurious errors that must be suppressed by manually editing that generated code. Such changes will be wiped out as soon as a change is made in the Winforms designer.

Because the feature is not a proper language extension, the implementation is forced within the bounds of the existing language features. A more promising implementation was Spec#—which extended the C# compiler itself—but there hasn’t been any activity on that project from Microsoft Research in quite some time. There are, however, a lot of interesting papers available there which offer a more developer-friendly insight into the world of design-by-contract than the highly compiler-oriented point-of-view espoused by the Contracts team.

This author will be taking a pass on the initial version of DBC as embodied by Microsoft Contracts.

[1] With which this author is acquainted.

[2] Examples use C# 3.5 unless otherwise noted.

[3] Please excuse any and all compile errors, as I haven’t got access to a current Eiffel installation and am piecing this example together from documentation and what I remember about writing Eiffel code.

[4] This admission goes a long way toward explaining why code with generics and lambdas cannot be changed in an edit-and-continue debugging session. These language features presumably also rely on rewriting, instrumentation and code-injection.

Remote Debugging with [ASP].NET

2009-05-19T23:07:24+02:00

Published by marco on 19. May 2009 23:07:24 (GMT-5)

When a .NET application exhibits behavior on a remote server that cannot be reproduced locally, you’ll need to debug application directly on the server. The following article includes specific instructions for debugging ASP.NET applications, but applies just as well to standalone executables.

Prerequisites

There are several prerequisites for remote debugging; don’t even bother trying until you have all of the items on the following list squared away or the Remote Debugger will just chortle at your naiveté.

The SERVER must have the Visual Studio Remote Debugging Monitor installed.
The firewall must be opened for Visual Studio on the client (which means that ReSharper sees other instances); remote debugging involves two-way communication.
A local user, BOB, with administrator rights on the client machine.
A server user, BOB, with administrator rights on the SERVER machine.
The names must match.
The monitor must be started on the SERVER using BOB (using “Run as…”)
If you’re not debugging in the same domain, then you have to change the options to use the Server name in the options to “BOB@SERVER”

Before you think you can get all fancy and simply debug remotely without authentication, know this: unauthenticated, native debugging does not support breakpoints, so forget it. You’ll technically be able to connect to a running application but, without breakpoints, you’ll only be able to watch any pre-existing debug output appear on the console, if that.

Firewall ports

The following ports must be open in order for Remote Debugging to function correctly in all situations:

Protocol    Port     Service Name
TCP         139      File and Printer Sharing
TCP         445      File and Printer Sharing
UDP         137      File and Printer Sharing
UDP         138      File and Printer Sharing
UDP         4500     IPsec (IKE NAT-T)
UDP         500      IPsec (IKE)
TCP         135      RPC Endpoint Mapper and DCOM infrastructure

Additionally, the application “Microsoft Visual Studio 2008” must be in the exceptions list on the client and “Visual Studio Remote Debugging Monitor” must be in the exceptions list on the server.

Recommendations

Once you’ve satisfied the requirements above, you should probably also heed the following tips: it’s best to read about them now rather than learn about them the hard way later:

Make sure to turn off recycling and auto-shutdown for the AppPool while debugging, so you don’t run the risk of your PID suddenly being gone.
Make sure that you’re using debug versions of all assemblies where you want to debug or you’ll be staring at IL assembly code more often than you’d like.
Make sure your local source code is in-sync with the source code on the SERVER or you’ll be debugging on the wrong lines at best or be staring at IL assembly at worst.
It’s best if the path to your local symbols is also valid and writable on the server so that symbols cached during remote debugging can be stored on the server. Check the “Options..Debugging..Symbols” to change that path if you need to. (there’s more about this below if this doesn’t make sense)

Test Run

Here are steps you can follow to debug an application remotely. These steps worked for me, but the remote debugging situation seems to be extremely hit-or-miss, so your mileage may vary.

Open your web project in Visual Studio and compile it in debug mode.
Outside of Visual Studio, build a deployment version of the web site and copy it to the SERVER.
If this is the first time setting it up, move the application to its own ApplicationPool, so you can detect it more easily later.
If you haven’t already, install the Visual Studio Remote Debugging Monitor to the SERVER.
Make sure you have a local user on that machine with your own user name, USER.
Start the Visual Studio Remote Debugging Monitor using “Run as…” and entering USER on that server.
When it has started, select “File..Options” from the menu and change the server name to USER@SERVERNAME.
From within Visual Studio, select “Debug..Attach to Process” from the menu.
In the dialog, specify the USER@SERVERNAME you used in the debugging monitor above and hit Refresh.
Scroll down until you see the “w3wp.exe” processes

You’ve set up the server and attached to it so far. If anything has gone wrong, check the troubleshooting section below to see if your problem is addressed there. Now, the next steps are optional if you think you can identify your process without knowing the PID (Process ID). This is generally the case only when yours is the only .NET application deployed to that server. In that case, your process is the “w3wp.exe” process which includes “managed code”. If you don’t know your PID, follow the optional instructions below to figure out which one is yours.

From the client machine, download the attached script “ASP.NET PID Detector” and open it in a text editor.
Change the machineName, appPoolName and url to match the settings for your application on the SERVER. (this is the reason we put our application into its own application pool at the beginning.)
Save the file as a different name (probably with the machine name and server in the title).
Execute the file and follow instructions; it will probably launch your web site in IE. It will probably also claim to have failed. Run it again and it will give you the PID of your application on the server.

If that didn’t work, then you probably aren’t configured to query WMI remotely; your only options are to try to run it remotely using the instructions and tips below or to run it from the server.

If you have remote desktop access to the server, then copy the script to the server and configure the batch file to query the local script and server (recommended).
Turn off the Windows Firewall on the server completely (not recommended if the server is open to the internet).
Follow instructions at Enable WMI (Windows Management Instrumentation) to enable remote administration through the firewall. Not only must you execute a special command to configure the Firewall (only available from the command line) but your user must also have the correct permissions (also not recommended, as enabling WMI can open up the server in unexpected ways if you don’t know what you’re doing). I did not attempt this, as I could simply run the PID-detector from the server.

Once you have the PID in hand, continue:

Select the “w3wp.exe” process with your PID and double-click it to attach to that process.
It will ask whether remote symbols can be stored on the server in the given location. You should say yes but it will try to save those symbols on the server using the same path you use for storing symbols locally on your development machine. [1]
Set a breakpoint where desired; the breakpoint should be solid red. If it is, you’re done.
Browse the application in IE to trigger the breakpoint and debug away.

[1] I’m honestly not sure whether this is required or not, but I allowed it and it worked. It may also work without caching the symbols if the path can’t be written.

Troubleshooting

As you can probably tell from the massive list of prerequisites and recommendations as well as the 20-step guide to triggering a breakpoint, there’s a lot that can go wrong with Remote Debugging. It’s not insurmountable, but it’s not something you’re going to want to attempt unless your job pretty much depends on it. These are some of the errors I encountered along the way and how I addressed them.

Unable to connect to the Microsoft Visual Studio Remote Debugging Monitor named ‘USER@SERVER’. The Visual Studio Remote Debugger on the target computer cannot connect back to this computer. Authentication failed. Please see Help for assistance.

You need to create a local administrator with the same password as the one you’re using on the server to run the debugging monitor.

Unable to connect to the Microsoft Visual Studio Remote Debugging Monitor named ‘USER@SERVER’. The Visual Studio Remote Debugger on the target computer cannot connect back to this computer. A firewall may be preventing communication via DCOM to the local computer. Please see Help for assistance.

You opened the firewall, but only for computers on the same subnet. The computer to which you are connecting is probably not on the same subnet, so you’ll need to go to the firewall settings and open them up all the way (Visual Studio will not ask again). To edit the firewall settings, do the following:

Open the “Windows Firewall” control panel.
Select the “Exceptions” tag.
Scroll to the “Microsoft Visual Studio 2008” entry and double-click it.
From the dialog, press the “Change Scope” button.
Change it to “Any computer (including those on the Internet)”.
Press “Ok” three times to save changes.

It’s also possible that the Remote Debugger is being blocked on the server side. To address this, run the “Visual Studio 2008 Remote Debugger Configuration Wizard” again; if the wizard wants to adjust firewall settings, let it do so (for internal or external networks, as appropriate to your situation – if you’re not sure, use external). To make sure that the settings were applied, run the wizard again; it should ask you about running the service, but should no longer complain about the firewall.

If it still complains about the firewall, then you’ve got another problem, which is that the setup is having trouble adjusting the settings for the firewall but isn’t telling you that it’s utterly failing when it attempts to do so. Verify that you’re running the wizard as a user that has permission to adjust the firewall settings.

Unable to connect to the Microsoft Visual Studio Remote Debugging Monitor named ‘USER@SERVER’. Logon failure: unknown user name or bad password. See help for more information.

The user with which you are executing Visual Studio on the client does not exist on the server or has a different password. In order to avoid adding useless user accounts to the server’s domain, you should restart your IDE using “Run as…” to set the security context to the same user as you have on the server.

You can impersonate other users, but you have set a registry key; see Remote Debugging Under Another User Account for more information. This doesn’t help though, if the user you are trying to use doesn’t even have an account on the remote machine.

Conclusion

Remote debugging sounds way cool and is the major difference between the Standard and Professional versions of Visual Studio, but it’s not for the faint of heart or the inexperienced. If you Google around a bit, you’ll notice that most people get a big heap of epic fail when they try it and I’ve tried to make as comprehensive guide to remote debugging as my own experience and time constraints allowed.

Here’s hoping you never have to do remote debugging (write a test instead! *smile*) but, if you do, I wish you the best of luck.

The Dark Side of Entity Framework: Mapping Enumerated Associations

2009-05-18T22:54:44+02:00

Published by marco on 18. May 2009 22:54:44 (GMT-5)

This article was originally published on the Encodo Blogs. Browse on over to see more!

At Encodo, we’re using the Microsoft Entity Framework (EF) to map objects to the database. EF treats everything—and I mean everything—as an object; the foreign key fields by which objects are related aren’t even exposed in the generated code. But I’m getting ahead of myself a bit. We wanted to figure out the most elegant way of mapping what we are going to call enumerated associations in EF. These are associations from a source table to a target table where the target table is a lookup value of type int. That is, the enumerated association could be mapped to a C# enum instead of an object. We already knew what we wanted the solution to look like, as we’d implemented something similar in Quino, our metadata framework (see below for a description of how that works).

The goals are as follows:

Properties of the enumerated type are stored in the database, including its identifier, its value and a mapping to translations.
Relations to the enumerated value are defined in the database as constraints.
The database is therefore internally consistent
C# code can work with an enumerated type rather than a sub-object; this avoids joining the enumerated type tables when retrieving the main object or restricting to the enumerated type’s value.

EF encourages—nay, requires—that one develop the application model in the database. A database model consists of tables, fields and relationships between those tables. EF will map those tables, fields and relationships to classes, properties and sub-objects in your C# code. The properties used to map an association—the foreign keys—are not exposed by the Entity Framework and are simply unavailable in the generated code. You can, however, add custom code to your partial classes to expose those values [1]:

return Child.ParentReference.ID;

However, you can’t use those properties with LINQ queries because those extra properties cannot be mapped to the database by EF. Without restrictions or orderings on those properties, they’re as good as useless, so we’ll have to work within EF itself.

Even though EF has already mapped the constraint from the database as a navigational property, let’s add the property to the model as a scalar property anyway. You’ll immediately be reprimanded for mapping the property twice, with something like the following error message:

Error 3007: Problem in Mapping Fragments starting at lines 1383, 1617: Non-Primary-Key column(s) [ColumnName] are being mapped in both fragments to different conceptual side properties − data inconsistency is possible because the corresponding conceptual side properties can be independently modified.

Since we’re feeling adventurous, we open the XML file directly (instead of inside the designer) and remove the navigational property and association, then add the property to the conceptual model by hand. Now, we’re reprimanded for not having mapped the association EF found in the database, with something like the following error message:

Error 11008: Association ‘FOREIGN_KEY_NAME’ is not mapped.

Not giving up yet, we open the model in the designer again and delete the offending foreign key from the diagram. Now, we get something like the following error message:

Error 3015: Problem in Mapping Fragments starting at lines 6680, 6699, 6716, 6724, 6801, 6807, 6815: Foreign key constraint ‘FOREIGN_KEY_NAME’ from table Source (SourceId) to table TargetType İ:: Insufficient mapping: Foreign key must be mapped to some AssociationSet on the conceptual side.

The list of line numbers indicate where the foreign key we’ve deleted is still being referenced. Despite having used the designer to delete the key, EF has neglected to maintain consistency in the model, so it’s time to re-open the model as XML and delete the remaining references to ‘FOREIGN_KEY_NAME’ manually.

We’re finally in the clear as far as the designer and compiler are concerned, with the constraint defined as we want it in the database and EF exposing the foreign key as an integer—to which we can assign a typecast enum—instead of an object. This was the goal, so let’s run the application and see what happens.

Everything works as expected and there are no nasty surprises waiting for us at runtime. We’ve got a much more comfortable way of working with the special case of enumerated types working in EF. This special case, arguably, comes up quite a lot; in the model for our application, about half of the tables contain enumerated data, which are used as lookups for reports.

It wasn’t easy and the solution involved switching from designer to XML-file and back a few times [2], but at least it works. However, before we jump for joy that we at least have a solution, let’s pretend we’ve changed our database again and update the model from the database.

Oops.

The EF-Designer has detected the foreign key we so painstakingly deleted and re-established it without asking for so much as a by-your-leave, giving us the error of type 3007 shown above. We’re basically back where we started … and will be whenever anyone changes the database and updates the model automatically. At this point, it seems that the only way to actually expose the foreign key in the EF model is to remove the association from the database! Removing the constraint in the database, however, is unacceptable as that would destroy the relational integrity just to satisfy a crippled object mapper.

In a last-ditch effort, we can fool EF into thinking that the constraint has been dropped not by removing the constraint but by removing the related table from the EF model. That is, once EF no longer maps the destination table—the one containing the enumerated data—it will no longer try to map the constraint, mapping the foreign key as just another integer field.

This solution finally works and the model can be updated from the designer without breaking it—as long as no one re-adds the table with the enumerated data. This is the solution we’ve chosen for all of our lookup data, establishing a second EF-model to hold those tables.

The main model contains non-enumerated data; relations to enumerated data end in integer fields instead of objects.
The lookup model contains a list of enumerated data tables; these are queried for the contents of drop-down lists and so on.
We defined an enumerated type in C# for each table in the lookup model, with values corresponding to the values that go in the lookup table.
We wrote a synchronizer to keep the data in the lookup tables synchronized with the enum-values in C#.
Business logic uses these enumerated types to assign the values to the foreign-key integer fields (albeit with a cast).

Using Quino to Solve the Problem

It’s not a beautiful solution, but it works better than the alternative (using objects for everything). Quino, Encodo’s metadata framework includes an ORM that addresses this problem much more elegantly. In Quino, if you have the situation outlined above—a data table with a relation to a lookup table—you define two classes in the metadata, pretty much as you do with EF. However, in Quino, you can specify that one class corresponds to an enumerated type and both the code generator and schema migrator will treat that meta-class accordingly.

The code generator maps relations with the enumerated class as the target to the C# enum instead of an object, automatically converting the underlying integer foreign key to the enumerated type and back.
The schema migrator detects differences between the C# enumerated type and the values available in the lookup table in the database and keeps them synchronized.
As simple integer enums, the values can be easily restricted and ordered without joining extra tables.
Generated code used the C# enumerated type, which ensures type-safety and code-completion, including documentation, in business code.

EF has a graphical designer, whereas Quino does not, but the designer only gets in the way for the situation outlined above. Quino offers an elegant solution for lookup values with only two lines of code: one to create the lookup class and indicate which C# enum it represents and one to create a property of that type on the target class. The Quino Demo (not yet publicly available) contains an example.

[1] You can also try to modify the T4 templates used to generate code, but that would be futile for reasons that follow.

[2] Which is, frankly, appalling, but hardly unexpected for a 1.0 product from Microsoft, which usually needs a few tries to get things working smoothly.

Eject/Change a CD from Windows inside a XEN-VM using VNC

2009-05-18T21:46:32+02:00

Published by marco on 18. May 2009 21:46:32 (GMT-5)

This article was originally published on the Encodo Blogs. Browse on over to see more!

At Encodo, we currently run Debian Etch on our servers, with a Xen hypervisor managing a bunch of individual virtual machines (VMs). Most of the VMs also run Debian Etch, but one of them runs Windows Server 2003 instead. We use this machine for testing integration with Microsoft technologies like Sharepoint, Exchange and so on. Recently, we had to re-install the Exchange instance on that server and were faced with the problem of having to change the CD without rebooting the VM. Luckily, we found the article, Xen 3.0.3 change cdrom with windows 2003, which cryptically describes how to do this. The instructions describe pressing ctrl+alt+1, but where?

The trick is to realize that they are assuming three things:

You have configured the VM to provide a VNC port
You have attached to said port with a VNC viewer
You realize that you can’t press these keys directly, but must select them from the viewer’s system menu (as shown below).

Before you do anything, verify that you have made the physical CD/DVD available to the machine, by specifying something like the following in the XEN configuration file for the VM:

disk = [ 'file:/home/xen/domains/burken/disk1.img,ioemu:hda,w', 'phy:/dev/cdrom,hdc:cdrom,r' ]

The first disk (disk1.img) is a disk image for the system itself; the second disk (hdc:cdrom) is the physical CD/DVD. Until you see the CD inside the VM, you don’t have to even worry about trying to eject it.

You also need to make sure the VNC port is available, again with a line in the configuration:

vnc=1

If you make any changes to the configuration, you’ll need to restart the VM before you see the effects. Use the additional configuration option called vncpasswd to lock down the VNC port.

Once you can see the CD within the VM and you can open a connection with the VNC viewer, you’re ready to actually follow the instructions in the post linked above:

type ctrl-alt-2
type help
type eject hdb
Change the CD manually (at the server itself)
type change hdb /dev/cdrom [1]

At this point, you might think you’re done, but the first step is a stumbling block as you don’t actually type ctrl and alt; instead, you select them from the system menu, as illustrated below:

That’s it; you should see the new CD in the VM and you can continue with your installation.

[1] Where /dev/cdrom corresponds to the CD/DVD drive in question.

Elegant Code vs.(?) Clean Code

2009-04-16T13:51:42+02:00

Published by marco on 16. Apr 2009 13:51:42 (GMT-5)

Updated by marco on 17. Apr 2009 08:41:13 (GMT-5)

This article was originally published on the Encodo Blogs. Browse on over to see more!

A developer on the Microsoft C# compiler team recently made a post asking readers to post their solutions to a programming exercise in Comma Quibbling by Eric Lippert (Fabulous Adventures in Coding). The requirements are as follows:

If the sequence is empty then the resulting string is “{}”.
If the sequence is a single item “ABC” then the resulting string is “{ABC}”.
If the sequence is the two item sequence “ABC”, “DEF” then the resulting string is “{ABC and DEF}”.
If the sequence has more than two items, say, “ABC”, “DEF”, “G”, “H” then the resulting string is “{ABC, DEF, G and H}”. (Note: no Oxford comma!)

On top of that, he stipulated “I am particularly interested in solutions which make the semantics of the code very clear to the code maintainer.”

Before doing anything else, let’s nail down the specification above with some tests, using the NUnit testing framework:

[TestFixture]
public class SentenceComposerTests
{
  [Test]
  public void TestZero()
  {
    var parts = new string[0];
    var result = parts.ConcatenateWithAnd();

    Assert.AreEqual("{}", result);
  }

  [Test]
  public void TestOne()
  {
    var parts = new[] { "one" };
    var result = parts.ConcatenateWithAnd();

    Assert.AreEqual("{one}", result);
  }

  [Test]
  public void TestTwo()
  {
    var parts = new[] { "one", "two" };
    var result = parts.ConcatenateWithAnd();

    Assert.AreEqual("{one and two}", result);
  }

  [Test]
  public void TestThree()
  {
    var parts = new[] { "one", "two", "three" };
    var result = parts.ConcatenateWithAnd();

    Assert.AreEqual("{one, two and three}", result);
  }

  [Test]
  public void TestTen()
  {
    var parts = new[] { "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten" };
    var result = parts.ConcatenateWithAnd();

    Assert.AreEqual("{one, two, three, four, five, six, seven, eight, nine and ten}", result);
  }
}

The tests assume that the method ConcatenateWithAnd() is declared as an extension method. With the tests written, I figured I’d take a crack at the solution, keeping the last condition foremost in my mind instead of compactness, elegance or cleverness (as often predominate). Instead, I wanted to make the special cases given in the specification as clear as possible in the code. On top of that, I added the following conditions to the implementation:

Do not create a list or array out of the enumerator. That is, do not invoke any operation that would involve reading the entire contents of the enumerator at once (e.g. the extension methods Count() or Last() are verboten).
Avoid comments; instead, make the code comment itself.
Make the code as clearly efficient as possible without invoking any potentially costly library routines whose asymptotic order is unknown.

That said, here’s my version:

public static string ConcatenateWithAnd(this IEnumerable words)
{
  using (var enumerator = words.GetEnumerator())
  {
    if (!enumerator.MoveNext())
    {
      return "{}";
    }

    var firstItem = enumerator.Current;

    if (!enumerator.MoveNext())
    {
      return "{" + firstItem + "}";
    }

    var secondItem = enumerator.Current;

    if (!enumerator.MoveNext())
    {
      return "{" + firstItem + " and " + secondItem + "}";
    }

    var builder = new StringBuilder("{");
    builder.Append(firstItem);
    builder.Append(", ");
    builder.Append(secondItem);

    var item = enumerator.Current;

    while (enumerator.MoveNext())
    {
      builder.Append(", ");
      builder.Append(item);
      item = enumerator.Current;
    }

    builder.Append(" and ");
    builder.Append(item);
    builder.Append("}");

    return builder.ToString();
  }
}

Looking at this from a maintenance or understanding point-of-view, I have the following notes:

More novice users will probably not immediately grasp the use of the enumerator. Though it’s part of the .NET library, its use is usually hidden by the syntactic sugar of the foreach-statement.
The formatting instructions for curly brackets and separators are included several times, which decreases maintainability should the output specification change.
The multiple calls to the string-concatenation operator and to StringBuilder.Append() are intentional. I wanted to avoid having to use escaped {} in the format string (e.g. String.Format(“{{{0} and {1}}}”, firstItem, secondItem) is confusing if you’re not aware how curly brackets are escaped in a format string).

Other than those things, it seems relatively compact and efficient. With my own version written, I looked through the comments on the post to see if any other interesting solutions were available. I came up with two that caught my eye, one by Jon Skeet and another by Hresto Deshev, who submitted his in F#.

Hristo’s example in F# is as follows:

#light
let format (words:list) =
   let rec makeList (words: list) =
       match words with
           | [] -> ""
           | first :: [] -> first
           | first :: second :: [] -> first + " and " + second
           | first :: second :: rest -> first + ", " + second + ", " + (makeList rest)
   "{" + (makeList words) + "}"

That’s so cool: the formulation in F# is almost plain English! That’s pretty damned maintainable, I’d say. I have no way of judging the performance of this just-in-time parsing, but it does make use of recursion: lists with thousands of items will incur thousands of nested calls.

Next up is Jon Skeet’s version in C#:

public static string JonSkeetVersion(this IEnumerable words)
{
  var builder = new StringBuilder("{");
  string last = null;
  string penultimate = null;
  foreach (string word in words)
  {
    // Shuffle existing words down
    if (penultimate != null)
    {
      builder.Append(penultimate);
      builder.Append(", ");
    }
    penultimate = last;
    last = word;
  }
  if (penultimate != null)
  {
    builder.Append(penultimate);
    builder.Append(" and ");
  }
  if (last != null)
  {
    builder.Append(last);
  }
  builder.Append("}");
  return builder.ToString();
}

This one is very clever and handles all cases in a single loop rather than addressing special cases outside of a loop (as mine did). Also, all of the formatting elements—the curly brackets and item separators—are mentioned only once, improving maintainability. I immediately liked it better than my own solution from a technical standpoint. While I’m drawn to the cleverness and elegance of the solution, I’m not the target audience. Skeet’s version forces you to reason out the special cases; it’s not immediately obvious how the special cases for zero, one and two elements are handled. Also, while I am tickled pink by the aptness of the variable name penultimate, I wonder how many non-native English speakers would understand its intent without a visit to an online dictionary. The name secondToLast would have been a better, though far less sexy, choice.

It’s very easy to underestimate how little people are willing to actually read code that they didn’t write. If the code requires a certain amount of study to understand, then they may just leave it well enough alone and seek the original developer. If, however, it looks quite easy and the special cases are made clear—as in my version—they are far more likely to dig further and work with it. Since the problem is defined as a three special cases and a general case, it is probably best to offer a solution where these cases are immediately obvious to ease maintainability—and as long as you don’t sacrifice performance unnecessarily. Cleverness is wonderful, but you may end up severely limiting the number of people willing—or able—to work on that code.

Cleaning up Old Code

2009-04-05T21:02:50+02:00

Published by marco on 5. Apr 2009 21:02:50 (GMT-5)

Once you’ve been coding for a while, you’ll probably have quite a pile of code that you’ve written and are regularly using. It’s possible that you’ve got some older code in use that just works and on which you rely every day. At some point, though, you realize that you have to get back in there and fix a few things. That happened recently with the upgrade of the earthli WebCore and attendant applications from PHP4 to PHP5 (which is ongoing). The earthli codebase was born in 1999 and was originally designed to run on PHP3. It has been quite aggresively upgraded and rewritten since then and is thus in pretty decent condition, from the design and stability side of things.

The code formatting, however, was old-style and broke a few cardinal rules I’d picked up since 1999. There were two problems in particular that I wanted to address:

Single line statements within conditionals or loops generally lacked braces, which is:
1. potentially dangerous
2. difficult to read
3. confusing for the Zend debugger
Many final return statements in functions or methods were nested within a useless else statement, e.g.:
```
if ($something_is_true)
  return true;
else
  return false;
```

Instead of just living with it, however, I did some global search/replace with regular expressions kung fu to get the code back up to snuff. I used the PCRE support in Zend Studio build 20090119, which I assume just uses the standard Eclipse search/replace support. All operations were applied solution-wide with relatively little trouble.

First, I searched for if-statements whose contents did not start with an opening curly bracket:

Search: ([ ]+)if \(([^\n]+)\)\n[ ]+([^ {][^\n]+)
Replace: \1if (\2)\n\1{\n\1  \3\n\1}

From there, it’s relatively easy to find/replace all single-line else-statements, by searching for the following:

Search: ([ ]+)else\n[ ]+([^ {][^\n]+)
Replace: \1else\n\1{\n\1  \2\n\1}

Then, I did the more esoteric elseif:

Search: ([ ]+)elseif \(([^\n]+)\)\n[ ]+([^ {][^\n]+)
Replace: \1elseif (\2)\n\1{\n\1  \3\n\1}

And finally, I replaced all loop constructs

Search: ([ ]+)foreach \(([^\n]+)\)\n[ ]+([^ {][^\n]+)
Replace: \1foreach (\2)\n\1{\n\1  \3\n\1}

Search: ([ ]+)for \(([^\n]+)\)\n[ ]+([^ {][^\n]+)
Replace: \1for (\2)\n\1{\n\1  \3\n\1}

Once I’d normalized all of the else-statements, I could clean up else-statements that included only a return-statement.

Search: ([ ]+)else\n[ ]+\{\n[ ]+return ([^\n]+)\n[ ]+\}\n
Replace: \n\n\1return \2\n

There are a lot more interesting things you can to globally alter your code if you’re willing to put some time into building your regular expressions. Legibility is better, debugging works better and there are far fewer warning reported by the compiler.

Customizing Facebook Previews from your Website

2009-03-17T23:04:57+01:00

Published by marco on 17. Mar 2009 23:04:57 (GMT-5)

When someone posts a link to your web site on Facebook, it retrieves a preview and presents that as the default text, along with a selection of pictures it found in the page. Clearly, Facebook has some sort of scraper that extracts what it thinks is the best preview text from a given URL. Sometimes it works well, sometimes not. Luckily, you can tune your pages for Facebook requests, emphasizing the parts you think are important and belong in the preview.

It’s anybody’s guess how the scraper actually works but, at the very least, we know that it uses a special user agent when accessing your site. Given that, you can customize your response when Facebook comes calling. The user agent is given below:

facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)

The earthli WebCore just recently got upgraded to detect Facebook. When a browser of unknown capabilities makes a request to a WebCore site, it generally includes a banner in the header, urging the user to download a supported browser (as shown below). Until recently, the message didn’t include HTML paragraph tags; once it acquired them, the Facebook scraper started using the warning text as the suggested summary for every link posted from WebCore sites.

This clearly would not do, therefore the earthli Browser Detector was updated to include support for detecting requests made for the purpose of extracting a preview. [1] Search engines generally frown on content-customization but Facebook can hardly complain. In the WebCore’s case, the default renderer now leaves off both the banner and footer of the page, generating only the page body, where the most important text is most likely to be.

[1] The download there is extremely out of date; download the latest version from the Mercurial repository instead.

To use the earthli Browser Detector, include the file in your PHP template and do something like the following:

$browser = new BROWSER();

if ($browser->is(Browser_previewer))
{
  // Render page for Facebook (and other previewers as they are supported)
}
else
{
  // Render page content for standard browsers
}

Of course, you can always do your own user-agent testing; you don’t have to use the browser detector, though it does offer many other useful capability checks and is rock-solid at browser detection.

Either way, customizing content for Facebook will go a long way to making links to your sites much easier and faster to create.

Debugging PHP

2009-03-16T23:15:24+01:00

Published by marco on 16. Mar 2009 23:15:24 (GMT-5)

PHP is a programming language like any other; like any other, it’s possible to construct a bug complex enough that it can only reasonably be solved with a debugger. Granted, most PHP code is quite simple and limited to single pages with single include files and a limited library or framework. However, the advent of PHP5 has ushered in more than one team with the courage to build a full-fledged web framework. You would think that the state of PHP development had concordantly improved to the point that debugging scripts—on a local web server, at the very least—would be a no-brainer.

You’d be surprised.

The developer of the earthli WebCore [1] was courageous enough to attempt building a framework with PHP3. Since debuggers for PHP at that time (circa 1999) weren’t on anyone’s radar, PHP developers made do with the vaunted echo and print commands to simulate debugging. The WebCore quickly acquired a Javascript-based logger to which logging commands were written. Such methods only take one so far: for more complexly nested and recursive code which is more object-oriented and has a much larger stack, a debugger is really needed.

The port from PHP3 to PHP4 was accomplished without a debugger, but that was long ago. When it came time to port from PHP4 to PHP4.3, things were much harder. It is highly likely that very few developers encountered issues during that upgrade but complex libraries with heavy use of references felt the pain. PHP4.3 was ostensibly a maintenance release but included fixes for reference handling that not only caused a tsunami of new warnings but also subtly changed how references were handled. This was a portent of things to come in the far greater problems encountered when porting from PHP4.3 to PHP5.

Developers that never developed for PHP4 will need a bit of background on how references used to work. Succintly put, PHP4 by default created copies on variable assignment rather than assigning references. In order to get a reference instead, a developer had to explicitly request one with a special operator (&). Larger libraries with many methods soon became littered with ampersands. Even better, forgetting just one in a parameter caused PHP to create a new copy of the passed object. Changes made to that object within that routine were applied to the copy and mysteriously disappeared when the method call returned.

This was obviously an untenable situation so PHP5—based on the Zend 2.0 engine—reversed the default. Under PHP5, assignments that previously created copies now created only references. Mix this with a large library with many such implicit copies hidden throughout the code and let the fun begin. Luckily, incredibly savvy developers who had read of this change enforced an iron discipline and limited these types of implicit copies to only a few, well-marked places. [2] Thus, the massive pain entailed in a port from PHP4 to PHP5 was somewhat ameliorated.

As you, dear reader, can imagine, the need for a debugger at this point became overwhelming. Let the search begin.

PHPEclipse was the default editor throughout the development of the earthli WebCore, so the first step was to check out what they had to offer. It turns out that the current version of PHPEclipse supports both the XDebug and DBG debuggers. The XDebug debugger was mentioned much more often in the forums, so it seemed like a good place to start.

Though you can debug PHP using only the executable (PHP.EXE), you’d have to configure that executable to include all the extensions and settings that are already configured with the local web server you’ve likely got installed. It’s not that it can’t be done, but that the most convenient way to debug would be to just execute the page on the web server you’re already using for testing. So, step one is to get the debugger extension loaded in your local server. If you’re developing on Windows [3], the WAMP server package is an excellent, highly-configurable solution. For porting from PHP4 to PHP5, it also offers the unique ability to change from one to the other within seconds. It has numerous addons corresponding to previous releases of Apache, MySql and PHP which seamlessly integrate with the main installation. What it does not have is an installer for configuring the appropriate version of either XDebug or DBG.

It seems that PHP developers don’t, in general, use debuggers.

As usual with such things online—that is, things that very few people do—instructions are available, but they must be pieced together from several different locations. XDebug was up first and was, after many false starts, loaded by the local server. Some things to watch out for:

Get the first version of the extension; the wrong version is simply not loaded and the error message is either entirely suppressed or so well-hidden in the log files as to be well-nigh undetectable.
Instructions to set the property “zend_extension” on Windows are wrong; you must use “zend_extension_ts” instead (the threaded version).
Make sure you are editing the correct PHP.INI file; in WAMP, this is the file located in the bin folder of the currently-loaded version of Apache.

Once you are rewarded with the XDebug extension in the phpinfo() page, you’re ready to start debugging. Following the instructions at the PHPEclipse wiki, though confusing, will get you stopping at a break point soon enough. Imagine that! A breakpoint in PHP! Press F6 to step over that line and … wait … and wait … listen to the fan on your laptop start. Apache is using 100% CPU or as close to it as it can. Wait several minutes for things to sort themselves, but they never do. Use the Task Manager to kill the offending instance of Apache and you simply transfer the problem to Eclipse, which begins using 100% CPU. Long story short, debugging with XDebug never got farther than this relatively low point. The inital breakpoint worked, but nothing else.

On to DBG.

To keep things a bit shorter, DBG never even stopped at the initial breakpoint, regardless of settings.

The open-source world, it seems, has nothing to offer on the PHP debugging front.

On to Zend.

Zend makes the scripting engine used by PHP. They make numerous tools for analysis as well as the Zend Developer Studio. It costs 399 Euros and brags about its debugging capabilities right on the home page. It has a 30-day trial. It all sounds so promising. One 330MB-download and installation later and you’ve got the Zend Studio up and running. Once you’ve configured a new project, you can set up a debug configuration, which comes with copious well-written help as well as a “Test Debugger” button right in the configuration window.

As before, you can run the debug session using a local executable, but the more useful setup is to run the debugger through the local web server. To do this, you need to install the Zend debugger extension, which has much better instructions in the Zend Studio help.

Long story short, 399 Euros buys you a working debugger for PHP that flawlessly debugged code in several files—including code located in “Include Paths”—and was exactly like any other debugging experience in Eclipse.

So, if you need to debug PHP, you can either take the cheap route and hope that the open-source solutions work for your code or you can take the plunge and use the Zend Studio—if you’re actually earning money with PHP development, choice (B) is the logical one.

This PHP developer, on the other hand, is going to get his port from PHP4 to PHP5 done in the next 30 days.

[1] You guessed it: yours truly.

[2] Obviously, that was sarcasm. Discipline goes a long way, but only goes so far.

[3] The WebCore is currently developed there because it’s the most powerful machine available—the PowerPC Mac Mini just doesn’t hold up with Eclipse-based IDEs

Programming > earthli News 3.7

Avoid primary constructors in C# (for now)

Fighting with Fowler on Continuous Integration

Web Interop 2024

SourceLink and external sources

Overview

Diagram

File types

Design Considerations

Why can’t the nupkg just include the PDB and the *.cs files?

Why aren’t sources included in the PDB?

What about open-source?

Terminology

Debugging

Local sources

External sources

Navigation

SourceLink

Decompiled code

Producing packages

Additional Properties

Conditional packaging

Consuming packages

For debugging

For navigation

Troubleshooting

Symbols not loaded

Disable Just My Code

Is it available?

Manually load the module

Decompiling rather than downloading

Authentication fails

Configure from the notification

Bug in JetBrains tools

Configure from the Credentials Manager

Alternative: referencing projects, not packages

Directory.Build.Props and Directory.Build.Targets

Mechanics

Directory.Build.Props

Directory.Build.Targets

Rules-of-thumb

Learning how to use GenAI as a programming tool

AOT, JIT, and PGO in .NET

How to replace "warnings as errors" in your process

What are we trying to accomplish?

My team doesn’t care about warnings

My team now only cares about warnings

How much do you trust your team?

Let the CI do it, then

The CI is not necessarily stable

The future broke the build

What were the requirements again?

We are no longer in the dark ages

Using EditorConfig

The IDE also writes code

How to apply “silent” inspections?

Code Style vs. Formatting

Local vs. CI

Avoiding ugly or “noisy” commits

Configuring your solution

Sharing or copying configuration?

Conclusion

PRs suck. Stop trying to fix them.

PRs are, apparently, HUGE

PR web UIs are not good for reviews

Errors slip in

Dumbing down Git

Integrate all the time!?

I remember this…

Conclusion

"Developer experience" is rarely a requirement

Handling long-running projects

Architecture is about intent

Woefully unqualified "programmers"

Technology-independent software-development courses

Learning how to develop software

Online Courses

Udemy

PluralSight

Dometrain

Why can’t the `nupkg` just include the `PDB` and the `*.cs` files?