Support
scala

Type Early, Type Often

Continuing the series of articles about what some might consider best practices, but what I like to term “things I do that haven’t seemed like a bad idea yet”, this post looks at (very) simple ways to improve your Scala style using the type system, and in particular case classes.

Stringly Typed Programming

Consider the following domain object:

case class Person(first: String, last: String, address: String, age: Int, ssn: String)

Most people coming to Scala are attracted to the case class concept very quickly. Particularly compared with Java (using standard JavaBean practices) the above single-line definition encompasses tens of lines of code. Let’s take a quick refresher of what you get with case classes:

  1. Immutable (by default) public value fields (properties) for all of the parameters;
  2. A companion object with a factory .apply method: 
    val p = Person("Fred", "Smith", "123 Main", 25, "123-45-6789")
  3. .equals and .hashCode methods that work;
  4. An .unapply method (aka an Extractor) that enables deep pattern-matching against the contents of any instances of this class;
  5. A convenient .toString override;
  6. A .copy method that allows convenient construction of new instances based on the current instance with one (or more, or less come to that) fields changed:
    val p2 = p.copy(first = "Frannie", ssn="987-65-4321")
  7. Plus a couple of less well known, but still useful extras: the case class itself extends Product and the companion object (unless you override it) extends FunctionN (these can be used effectively, particularly for type class patterns when known, see the Typesafe Activator template on type class tricks for some examples).

In short you get a lot of bang for your one buck (or line). There are some limitations (like one case class may not inherit from another one either directly or indirectly, and some of the features are reliant on the number of fields being 22 or less) but the humble case class is a great basis for domain models.

So, are we done? Well, no! Here’s the problem so far. Let’s create a new person, say Harry Potter:

val harry = Person("Potter", "Harry", "123-45-6789", 25, "12 Grimauld Place, London")

What’s wrong with this, other than Harry having apparently stolen the identity of Fred Smith (Social Security Number is the same)? Well, we have the first/last name and SSN/address fields switched in the definition. It compiles just fine, but because the domain object is “stringly” typed, the fields definitions are wrong and the compiler has no way of knowing. Our IDE might tell us at least the names of the fields in each position, but there is nothing stronger to let us know of our mistake.

The only field we know is in the right place is the age, since that is an Int and is not subject to the stringly-typed universe we have created for ourselves.

Strongly Typed Programming

Let’s make this situation better:

case class Address(street: String, city: String, zipOrPostcode: String)
case class FirstName(name: String)
case class LastName(name: String)
case class SSN(id: String)
case class Age(age: Int)
case class Person(first: FirstName, last: LastName, address: Address, age: Age, ssn: SSN)

Person(FirstName("Harry"), LastName("Potter"),

Address("12 Grimauld Place", "London", "E10"),
Age(25), SSN("123-45-6789"))

It’s a bit more typing, but brings a lot more value. Reading the code makes it extremely clear what the fields are, and the compiler knows too. We can’t accidentally mix up first and last names any more, nor any of the other fields.

Some other possibilities now exist too - for example we could bound the Age type to between 0 and 150 (I’m an optimist), make sure the SSN is a valid one (although any pedants out there will probably realize that Harry Potter would have a national insurance number instead, being British -- the type system could police that for us as well, with the introduction of a BritishPerson case class and a NINum case class).

Taking it further, the Address fields could (and probably should) also be typed into domain objects. StreetAddress, City and ZipOrPostcode types seem sensible.

This might seem obvious, but perhaps some of the less immediately apparent advantages aren’t. Along with a level of built in validation of the domain data, the nature of case classes, once understood, allow for the possibility of structured (de)serialization using type-classes to be applied easily (and, with practice, somewhat generically). We have meaning and structure in the domain model now, and that can lend itself to uses inside of databases, data relationships and much more.

Another subtle advantage is the idea of data ontologies, that is, establishing a library of re-usable types across your project or organization. A string is just a string, but in many disiplines, types like Names, Invoices, Sequences, Resources and so on have specific meanings and some of those agreed and shared meanings can be utilized across your code base. It makes APIs safer and more uniform to do so. Such data libraries need to be maintained and living things, but the tools are available for you to establish and maintain them within the Scala type system.

More Than Just Parameters

Once we have our types, methods can take them of course, e.g.

def isASmith(lastName: LastName): Boolean = lastName.name == "Smith"

But perhaps even more useful is that we can employ similar typing for return arguments. Consider a stats method that returns the mean and standard deviation of a list of numbers:

def meanAndStdDev(xs: List[Double]): (Double, Double) = {
   val mean = xs.sum / xs.size
   val stdDev = math.sqrt(xs.map(x => (x - mean) * (x - mean)).sum)
   (mean, stdDev)
}

import scala.util.Random

val r = new Random

scala> val xs = List.fill(10)(r.nextDouble() * 20 + 10)
xs: List[Double] = List(16.516477423883806, 19.78821074085854, 26.785718368906707, 18.32337256131118, 29.181885673264194, 23.36482220857569, 12.201241719137885, 27.345167444914406, 13.683247264299627, 19.24236003525099)

scala> meanAndStdDev(xs)
res6: (Double, Double) = (20.643250344040304,17.543536687262595)

Tuples allow multiple data items to be returned from a function, but how much better to use another case class:

case class Stats(mean: Double, stdDev: Double)

def meanAndStdDev(xs: List[Double]): Stats = {
   val mean = xs.sum / xs.size
   val stdDev = math.sqrt(xs.map(x => (x - mean) * (x - mean)).sum)
   Stats(mean, stdDev)
}

scala> val stats = meanAndStdDev(xs)
stats: Stats = Stats(20.643250344040304,17.543536687262595)

scala> stats.mean
res9: Double = 20.643250344040304

scala> stats.stdDev
res10: Double = 17.543536687262595

We could easily extend stats to have many more fields, e.g. min, max, median, mode and so on without ever worrying about client of our API having to keep straight field positions of each number, instead the case class becomes part of the API, explaining to a caller what each value is, and meanwhile defining a useful type that other functions can require, knowing what statistical fields will be available to them

How Low Can You Go?

Or better asked: how early can you type?

Establishing a set of domain types, or identifying (and potentially enhancing) existing types that could be used, is most often where I start with a new problem these days. Once you have your vocabulary established, writing meaning is much easier.

Case classes are one of Scala’s great gifts, and this article barely scratches the surface of their uses. They can form the basis of algebraic data types, useful for advanced functional concepts, and the Scala compiler itself leans on case classes and pattern matching to construct and refine its abstract syntax tree.

Share