Thursday, June 11, 2020

Reading Scalaz API Functions (Principles 5, Scalaz Files)

The Scalaz API has a large number of functions, most of which have several things in common:

  1. They have type parameters.
  2. Their implementations are very short, typically one line.
  3. They do not have Scaladoc.

While they don’t have Scaladoc, their types are printed in the Scaladoc pages. However, with some practice, you can get a quick, accurate understanding of each function by taking into account two things:

  1. One of the goals of the Scalazzi Safe Scala Subset is that “Types Are Documentation”. For example, one of the rules is “no side-effects”. Side effects, by their nature, do not appear in type signatures of methods that create them; by banishing them, we eliminate that source of untyped program behavior.
  2. Heavy use of type parameters, and minimization of concrete information in each type signature, amplifies parametricity. For Scalaz API functions, there are far fewer factors to consider when working out what a type “means” than you must consider when reading typical Scala libraries.

Adding Scaladoc may be valuable to many users. However, with these factors, the value is greatly diminished, and it’s difficult to justify the development cost of comprehensive Scaladoc when we are already encouraging users to think in terms of “Types Are Documentation”.

That doesn’t mean there aren’t some simple “tricks” and rules-of-thumb that you can learn to accelerate this process.

Values of a type variable cannot arise from thin air.

Suppose that a function has a type variable B, and this type variable is used in the return type, which might be something like (List[Int], List[B]). There is no way for the function to make up its own Bs, so all Bs appearing in that list must have come from the arguments. If there were no Bs in the arguments, then that List[B] must be empty.

This rule, “must come from the arguments”, is significantly more flexible than it sounds, while allowing the caller to preserve data integrity in a type-checked way. For example, the signature def m[A, B](xs: List[A], f: A => B): List[B] does not require the result list to be empty, because the arguments supply a “way to get Bs”:

  1. Take any element from the xs list.
  2. Call f on that element.

Since the body of m has no other source of As than xs, it’s a fact of the return value that “all elements must have come from calling f on elements of xs”.

In this way, you can use a type signature to read the “flow” of data from its source in the arguments, through other arguments, to its destination in the result. When you get used to this, all sorts of useful corollaries naturally arise. For example, the above fact means also “if xs is empty, then the result list must be empty”.

If you don’t know anything about it, you can’t look at it.

Within the body of def m[A, B](xs: List[A], f: A => B): List[B], there is exactly one special operation available for values of type A—they can be converted to B via calling f—and there are none for values of type B. You can’t compare them, so you can’t sort either the input list or the output list. You can’t take their hash code. You can’t add two As together to get a combined A, and the same goes for Bs. You could do something polymorphic like put an A and B in an (A, B) tuple, but that tells you nothing about those values.

In Scalazzi-safe programming, we supply information about types via typeclasses. For example, if we declared A: Order, there’s a way to compare As. If we declared B: Semigroup, there’s a way to append Bs. When thinking about what a typeclass constraint means for the “flow of data” in a function, you can think of a typeclass constraint as supplying the primitives of that typeclass as extra arguments to the function. (That is, after all, how Scala implements typeclass constraints.) For example, A: Order means that there’s an extra argument (A, A) => Ordering, where Ordering is the usual three-valued result of comparison, and the function is guaranteed to follow some special properties (the laws of the typeclass). B: Semigroup means that there’s an extra argument (B, B) => B, also guaranteed to follow its own special properties.

Naturally, if there are no typeclass constraints on a type variable, no such extra arguments are supplied; only the “ordinary” arguments provide capabilities for working with the type. Surprisingly, for all that is made of the importance of Scalaz’s typeclasses, this is by far the most common case.

You can’t “just crash”.

Consider the type signature def lp[L, R](i: L, s: L => Either[L, R]): R. The flow of data says that the result must have come from s returning a Right, and that s call’s argument must be either i, or a Left from a prior call to s. Moreover, you can safely expect lp to use the L produced by s each time after the first, rather than trying i or another previous L again; in functional programming, it would be absurd to try s(i) again and expect that it might return a Right, when you already know it previously returned a Left.

In particular, there’s no allowance for “timing out” or “duplicate L detection”. Timing out (in terms of a maximum number of s calls) would require a different return type, like Option[R]. Duplicate detection would require a constraint like L: Equal at minimum.

“It’s impossible to implement this signature” is not a reason to “implement” it by crashing; it’s a reason to not have a function with that signature at all. Writing correct type signatures is part of writing a correct type-safe program. When type signatures are only declared for functions that are possible to implement, and reading those type signatures can tell you what they are honestly doing, they start to become true machine-checked documentation.

The utilities of each typeclass are bound by the basics of that typeclass.

Many of the most useful functions in the Scalaz API are defined as typeclass utilities, so they can be most easily understood by keeping in mind those basics as you read the utility functions. So utilities in Functor must have been gotten by mapping, utilities in Foldable must have been gotten by folding, and so on.

For example, consider the utility under typeclass Functor[F[_]], def void[A](fa: F[A]): F[Unit]. What does this function do? If you guess based on its name void, or a poor analogy like “F is some kind of collection” (a common mistake when first approaching Functor), you might conclude something like “it must return an empty F” or “it must return an F with a single Unit in it”. Unfortunately, these “intuitive” answers are not only wrong, they don’t make sense.

Instead, think about void like this: def void[F[_]: Functor, A](fa: F[A]): F[Unit]. Just as with the Order and Semigroup constraints described above, that constraint manifests as its primitives; in this case, “a map function for Fs”. That leaves only one possibility, which happens to be the true behavior of void: the result is gotten by calling map on fa, with _ => () as the function argument.

Practice on the simple cases.

The above are a lot of words to describe a thinking process that is very fast in practice. With practice, it’s much faster to understand what a function does by reading its type than by reading a documentation comment. At the very least, a documentation comment should only be considered as secondary advice to the primary source, the type.

The functions that are easiest to understand purely with types are, unfortunately, the most likely to be fully documented. That makes relying solely on documentation comments more tempting, but this is a mistake as a Scalaz newcomer. If you practice reading only the types for simple functions like void, you’ll gain important practice for quickly understanding much more complex functions using the same techniques.