Make Illegal States Unrepresentable: Kotlin Data Class Edition

Because Murphy’s Law is a software engineering technique

Photo by SpaceX on Unsplash

Murphy’s Law and Software Engineering

Murphy’s law is commonly quoted as, “Anything that can go wrong, will.” The pedants among us will know that this is not, in fact, Murphy’s law. It’s Finagle’s law. Or Sod’s law. Murphy’s actual law is better stated as, “If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it that way.”

The origin of Murphy’s law is usually ascribed to Edward A. Murphy, an engineer working with U.S. Air Force in 1949 on project MX981. The air force wanted to know what would happen to the fragile human body if it was accelerated a up to ludicrous speeds. Clearly the best way to test this would be to actually accelerate a human body up to ludicrous speeds. So they strapped a poor test subject into a sled, strapped rockets to their back and fired them off. I’m sure I’ve seen that in a Road-Runner cartoon.

In order to translate this ‘experiment’ from schadenfreude into science, they attached sixteen accelerometers to the test subject. There were two possible ways the sensors could be attached. The right way, and the wrong way. Every sensor was attached the wrong way. So it was schadenfreude after all¹.

But it was also a lesson in engineering, because if the sensors had not been designed such that there were two possible ways to install them, and one of them incorrect, the disaster could not have happened. This principle can be generalised and is certainly applicable to software engineering.

In particular, if we are fortunate enough to be working with a typed language — let the types do as much of the heavy lifting as possible. Write your code such that it is not possible to construct objects or call methods the wrong way. This is not a new idea but is much easier to do in languages with an expressive type system such as OCaml, F#, Scala and Idris.

In this article we’re going to look at Kotlin. In particular how data classes, sealed classes and null-safety can be combined simply but powerfully to eliminate a class of error that typically plagues Java programs (and languages with similar features).

Use Case: Ordering a Credit Report

Businesses in Australia can be uniquely identified by either an Australian Business Number (ABN) or an Australian Company Numbers (ACN). All registered businesses will have an ACN. The preferred identifier is ABN, but not all businesses will have an ABN. Due to the complexities of business entities and credit reporting, we require the ability to identify a business either by ABN or by ACN.

Because we are good developers who love validation and hate stringly-typed programming, we create value types for ABN and ACN, like thus:

Value classes with validation

Next suppose we need a method which will purchase a report for a company. We can supply either an ABN or an ACN. Considered in isolation, the first thing that might spring to mind would be overloaded methods, like such:

However, let’s suppose that we’re building a pipeline which will require our business key to be passed around through multiple processing steps. This quickly makes overloaded methods untenable. The next thing you would probably consider is sticking both together into a parameter class.

Data class with nullable types

That’s fine, so far as it goes. But what if they’re both null? Wherever our business key goes we’ll end up with code like this:

Famous last words…

Have you ever written code like this? I certainly have. In Java, it’s considered good practice! Defensive programming. Cover all your bases. Even the bases that can’t possibly happen. Because, remember Murphy’s law, if it can happen the wrong way — it will. Usually at 2:00am when you’re on call for support.

Kotlin gives us better options. Let’s try and eliminate those offensive null checks.

Attempt 1: Constructor Validation

We could validate that at least one of ACN or ABN are present when we construct an OrganizationKey. It would look something like this:

Constructor validation

It’s not really that much better, to be honest. I refer to this kind of solution as ‘shovelling food around on your plate’. Sometimes you can move the issue somewhere else, but it still exists.

Actually, since we’re using Kotlin data classes, it’s no good at all. Kotlin data classes expose a ‘copy’ operator that would trivially allow a sleep deprived developer to do the wrong thing and bypass the constructor validation. Like such:

Damaged data class

Attempt 2: Factory methods

Another approach that might occur if you’re coming to Kotlin from Java is to use factory methods.

In Kotlin it would look like below:

Factory methods

In fact if you try this in IntelliJ the IDE itself will slap on on the wrist and give you an ugly yellow squiggle explaining that the ‘copy’ method allows you to bypass the private constructor. So in fact this solution is no better than the one above.

You could change this to not be a data class in order to avoid exposing a copy method, but that would just be a shame. Luckily a simple solution to our problem does exist.

Attempt 3: Sealed classes

Let’s subclass OrganizationKey, giving each subclass either an ABN or an ACN. Then we can make them non-nullable. We should make it a ‘sealed’ class while we’re at it — for reasons that will be explained momentarily.

Sealed data class

Well that’s better! I think we might have cracked it. When we need to use our key, it would look similar to this:

Note the ‘smart casts’ at play here. Unlike Java, you don’t to cast the organization key down to a subclass after the type check. Also, now that we have removed the need to have a nullable type around ABN or ACN we can’t accidentally construct an invalid key.

ABN is no longer nullable.

The code above will not compile. Job done!

Note this doesn’t protect you against a malicious actor looking to inject bugs. Also reflection, or anything passing in or out of a Java library, especially one that marshals or unmarshals such as Hibernate, GSON, Jackson, can sink your battleship. But you have a much stronger compile-time guarantee against someone accidentally using the object in the wrong way.

But why did we need to make OrganizationKey a sealed class? Because with that one little keyword we grant ourselves an extra layer of protection.

What if someone adds an extra type?

Uh-oh. Now we’ve got to go add an extra branch everywhere we were using our key. Hope we’ve got good tests!

Well actually, we’re better off that that. Marking OrganizationKey as sealed allows the compiler to do an exhaustivity check wherever a when statement is used an an expression.

We’ll have to change ‘purchaseReport’ to return something instead of returning Unit, but then the compiler will catch any when statements that do not check all possible types.

read original article here