Storage Combinators by Marcel Weiher and Robert Hirschfeld (2019, read in 2020)

Published by marco on

Updated by marco on

Disclaimer: these are notes I took while reading this book. They include citations I found interesting or enlightening or particularly well-written. In some cases, I’ve pointed out which of these applies to which citation; in others, I have not. Any benefit you gain from reading these notes is purely incidental to the purpose they serve of reminding me what I once read. Please see Wikipedia for a summary if I’ve failed to provide one sufficient for your purposes. If my notes serve to trigger an interest in this book, then I’m happy for you.

This paper (PDF) (Hasso Plattner Institute, University of Potsdam, Potsdam, Germany) discusses a proposal for composing all objects in a software system using a common API called “Storage Combinators” rather than using custom interfaces everywhere. They authors demonstrate how an application could benefit from composability of such operations—akin to how the REST standard has improved composability for disparate and individually developed services—purportedly without sacrificing any of the expressiveness of a more bespoke API.

The following summary is from the “Discussion” section:

“In-Process REST in general and storage combinators in particular take an architectural style known to work well in the distributed case and scale it down to work in the non-distributed, local case in order to bring along the modularity benefits associated with that style.”

Many bespoke APIs are needlessly different from one another. It’s an interesting idea to enforce API strictures akin to REST between software components running in the same memory space. What is a bit odd is their insistence on using Objective Smalltalk for all of their examples—a language that I’d never even heard of (and I’ve been paying attention). No-one is using this language[1] and its syntax is based on two of the bigger syntax boondoggles in our industry: Smalltalk and Objective-C. Due to this idiosyncratic choice, it’s not always easy to distill the purpose from the examples.

Still, the storage combinators are interesting and end up being the “In-process REST” that the authors described in the citation above. Once they’ve defined the basic storage API—GET, PUT, PATCH, and DELETE, they define a plethora of common behavior that transforms to it: switches, logging, caching, JSON, etc. From these components, they go on to define an HTTP server and client that communicate via a JSON protocol (naturally).

An interesting addendum is that they’ve actually developed this approach in several industry projects that are in production, so it’s not just an ivory-tower exercise without any real-world basis. I personally wonder how well the average developer can grok and work with a severely reduced API (in their eyes). That is, while it’s definitely possible for a good programmer to build everything they need from these combinators (essentially first-principle building blocks), it’s unclear to me whether that scales to a large enough base of actually existing developers. The approach is complete, but restrictive; it also makes it very difficult for developers to work with existing frameworks and profit from the documentation and community available there.

This possible downside aside, a positive effect was that developers wrote a lot less custom code, instead re-using the existing building blocks and creating their own. When they stayed focused on the compositional pattern, using the at-times very abstract building blocks, they did end up with a well-performing solution. A downside, as expected, was that, since the developers didn’t write most of the code themselves and the building blocks are relatively highly abstracted, debugging was more difficult and “it was often impossible to determine the dataflow path before runtime” and “it would be difficult to debug, because a programmer has to step through the whole data transformation path during runtime”.

These are pitfalls associated with any highly generalized framework. They are not incidental problems to be taken care of in minor updates, but point to a possibly fatal flaw in an overgeneralized framework.[2] Despite many iterations, they seem to have landed on a highly generic implementation that cannot be easily used by mere mortals. My own experiences with developing frameworks have taught me to stay slightly less generic with the parts that mortal developers will come into contact with. It’s not that they’re not smart enough to get it, but that they will all have to learn from one of the masters, which doesn’t scale well.

Instead, a slightly more redundant and not as highly generalized architecture tends to allow developers to build up local knowledge and synergies without constant “herding” by senior developers. They end with more code-duplication and are less able to benefit from performance improvements and bug fixes in common components, but they are able to work more autonomously—or even at all. The debugging and introspection issues mentioned above are kind of framework-killers as I don’t really see a good way around them short of developing a highly targeted DSL with its own source-level debugger, IDE, etc.

The conclusion of the paper is a bit at odds with itself: on the one hand, it writes,

“Their use correlates strongly with positive effects on code-size, performance, reliability and productivity, both observationally and in the minds of developers.”

But then immediately writes that,

“One area of future research is how to type and statically type-check storage combinators. The same generic nature that makes storage combinators so composable also makes it difficult to verify when they are connected correctly.”

This seems kind of like a fatal flaw that the team has managed to patch with having good developers who intimately understand the framework—and probably with lots of debugging hours. Their next topic in the conclusion is about debugging and discusses a DSL as I mentioned above as a possible solution.

None of these things will be easy to implement or build and it’s honestly unclear to me whether it even can be built without sacrificing the flexibility or the purity of the initial approach. In the end, you’ll end up with a compromised system with bespoke everything and many, many idiosyncratic and poorly documented behaviors. And, since everything’s bespoke, you can’t lean on external communities and documentation and search for anything. I wish them the best of luck, but don’t hold out much hope for the path that they’re on.

[1] It doesn’t have a Wiki page and the only home page I can find looks like it was built with Doxygen anno 1999 and the certificate isn’t even valid.↩

[2] While it’s possible for an underlying framework to be this highly generalized, it’s generally inefficient to have all of your developers working at this level—most of them aren’t skilled enough to do so. That is, a framework can benefit from a high level of generalization that significantly reduces code-duplication, but that part has to very rarely be evident to mortal developers. Instead, they should float in a substrate of other APIs, that are more more straightforward to specify (i.e. high-level APIs that benefit from composability but don’t necessarily expose it) and are more easily debugged and tested.↩

Citations

None.