To crash or not to crash; that is the question

Published by marco on

Note: I found this old draft containing my response to a colleague.

I 100% agree with you, in general. I absolutely want to know immediately when an assumption I’ve made does not hold.

But…😁

The degree to which I’m willing to crash depends on whose consistency I’m basing my assumptions on. When I call a method in my code from another method in my code, I’m absolutely going to assert that an argument is not null. I can control that. My IDE will tell me when I might be passing null. That is definitely a programming error.

When I’m getting external input (e.g. from the Windows registry), I’m a bit more cautious because I’m less sure about how solid my assumption is. I know what the documentation says but a lifetime of programming has taught me that some things (like the Windows registry) are going to work exactly as expected on my (modern) developer machine, but are going to fail mysteriously on a (perhaps less modern) machine in (for me) completely unpredictable ways.

Therefore, I’m a bit careful about is what I’m willing to pay to find errors. The primary purpose of a program is to bring value to the customer/user. I want to improve my program for more situations, but how am I going to find out in which situations it doesn’t work?

I can test, of course, but some things will only ever happen in the field. If it happens in the field, then I’m using the customer’s/user’s time to help me fix my program (they benefit, of course, but not for free). Can I soften the blow to the user of having to help me improve the program without sacrificing consistency or accuracy?

Sometimes, the answer is a resounding no. The program absolutely cannot continue if e.g., the reference to the data it needs to work on is null. That’s a no-go. There’s no rescuing the program from that or completing any other useful work.

In the case of this tool, if it crashes, the user no longer gets a report. Would they have been able to get some of the report if it hadn’t crashed? In this case, yes. All of the other checks could be run. The checks that crashed would show as “failed” with the exception message. That seems to me to be better than skipping all subsequent checks when one crashes.

I can even continue to hope that the user then reports the mysterious error message they got for one of the reports! Die Hoffnung stirbt zuletzt!

I’m delighted to discuss programming and error-handling philosophy in person next week!