Thursday, June 11, 2020

Reading Scalaz API Functions (Principles 5, Scalaz Files)

The Scalaz API has a large number of functions, most of which have several things in common:

  1. They have type parameters.
  2. Their implementations are very short, typically one line.
  3. They do not have Scaladoc.

While they don’t have Scaladoc, their types are printed in the Scaladoc pages. However, with some practice, you can get a quick, accurate understanding of each function by taking into account two things:

  1. One of the goals of the Scalazzi Safe Scala Subset is that “Types Are Documentation”. For example, one of the rules is “no side-effects”. Side effects, by their nature, do not appear in type signatures of methods that create them; by banishing them, we eliminate that source of untyped program behavior.
  2. Heavy use of type parameters, and minimization of concrete information in each type signature, amplifies parametricity. For Scalaz API functions, there are far fewer factors to consider when working out what a type “means” than you must consider when reading typical Scala libraries.

Adding Scaladoc may be valuable to many users. However, with these factors, the value is greatly diminished, and it’s difficult to justify the development cost of comprehensive Scaladoc when we are already encouraging users to think in terms of “Types Are Documentation”.

That doesn’t mean there aren’t some simple “tricks” and rules-of-thumb that you can learn to accelerate this process.

Values of a type variable cannot arise from thin air.

Suppose that a function has a type variable B, and this type variable is used in the return type, which might be something like (List[Int], List[B]). There is no way for the function to make up its own Bs, so all Bs appearing in that list must have come from the arguments. If there were no Bs in the arguments, then that List[B] must be empty.

This rule, “must come from the arguments”, is significantly more flexible than it sounds, while allowing the caller to preserve data integrity in a type-checked way. For example, the signature def m[A, B](xs: List[A], f: A => B): List[B] does not require the result list to be empty, because the arguments supply a “way to get Bs”:

  1. Take any element from the xs list.
  2. Call f on that element.

Since the body of m has no other source of As than xs, it’s a fact of the return value that “all elements must have come from calling f on elements of xs”.

In this way, you can use a type signature to read the “flow” of data from its source in the arguments, through other arguments, to its destination in the result. When you get used to this, all sorts of useful corollaries naturally arise. For example, the above fact means also “if xs is empty, then the result list must be empty”.

If you don’t know anything about it, you can’t look at it.

Within the body of def m[A, B](xs: List[A], f: A => B): List[B], there is exactly one special operation available for values of type A—they can be converted to B via calling f—and there are none for values of type B. You can’t compare them, so you can’t sort either the input list or the output list. You can’t take their hash code. You can’t add two As together to get a combined A, and the same goes for Bs. You could do something polymorphic like put an A and B in an (A, B) tuple, but that tells you nothing about those values.

In Scalazzi-safe programming, we supply information about types via typeclasses. For example, if we declared A: Order, there’s a way to compare As. If we declared B: Semigroup, there’s a way to append Bs. When thinking about what a typeclass constraint means for the “flow of data” in a function, you can think of a typeclass constraint as supplying the primitives of that typeclass as extra arguments to the function. (That is, after all, how Scala implements typeclass constraints.) For example, A: Order means that there’s an extra argument (A, A) => Ordering, where Ordering is the usual three-valued result of comparison, and the function is guaranteed to follow some special properties (the laws of the typeclass). B: Semigroup means that there’s an extra argument (B, B) => B, also guaranteed to follow its own special properties.

Naturally, if there are no typeclass constraints on a type variable, no such extra arguments are supplied; only the “ordinary” arguments provide capabilities for working with the type. Surprisingly, for all that is made of the importance of Scalaz’s typeclasses, this is by far the most common case.

You can’t “just crash”.

Consider the type signature def lp[L, R](i: L, s: L => Either[L, R]): R. The flow of data says that the result must have come from s returning a Right, and that s call’s argument must be either i, or a Left from a prior call to s. Moreover, you can safely expect lp to use the L produced by s each time after the first, rather than trying i or another previous L again; in functional programming, it would be absurd to try s(i) again and expect that it might return a Right, when you already know it previously returned a Left.

In particular, there’s no allowance for “timing out” or “duplicate L detection”. Timing out (in terms of a maximum number of s calls) would require a different return type, like Option[R]. Duplicate detection would require a constraint like L: Equal at minimum.

“It’s impossible to implement this signature” is not a reason to “implement” it by crashing; it’s a reason to not have a function with that signature at all. Writing correct type signatures is part of writing a correct type-safe program. When type signatures are only declared for functions that are possible to implement, and reading those type signatures can tell you what they are honestly doing, they start to become true machine-checked documentation.

The utilities of each typeclass are bound by the basics of that typeclass.

Many of the most useful functions in the Scalaz API are defined as typeclass utilities, so they can be most easily understood by keeping in mind those basics as you read the utility functions. So utilities in Functor must have been gotten by mapping, utilities in Foldable must have been gotten by folding, and so on.

For example, consider the utility under typeclass Functor[F[_]], def void[A](fa: F[A]): F[Unit]. What does this function do? If you guess based on its name void, or a poor analogy like “F is some kind of collection” (a common mistake when first approaching Functor), you might conclude something like “it must return an empty F” or “it must return an F with a single Unit in it”. Unfortunately, these “intuitive” answers are not only wrong, they don’t make sense.

Instead, think about void like this: def void[F[_]: Functor, A](fa: F[A]): F[Unit]. Just as with the Order and Semigroup constraints described above, that constraint manifests as its primitives; in this case, “a map function for Fs”. That leaves only one possibility, which happens to be the true behavior of void: the result is gotten by calling map on fa, with _ => () as the function argument.

Practice on the simple cases.

The above are a lot of words to describe a thinking process that is very fast in practice. With practice, it’s much faster to understand what a function does by reading its type than by reading a documentation comment. At the very least, a documentation comment should only be considered as secondary advice to the primary source, the type.

The functions that are easiest to understand purely with types are, unfortunately, the most likely to be fully documented. That makes relying solely on documentation comments more tempting, but this is a mistake as a Scalaz newcomer. If you practice reading only the types for simple functions like void, you’ll gain important practice for quickly understanding much more complex functions using the same techniques.

Saturday, May 16, 2020

Global typeclass coherence (Principles 3, Scalaz Files)

In Scalaz, we provide at most one typeclass instance per type throughout your whole program, and we expect users to preserve that invariant. For that type, there should be no way to get two incompatible instances. This is global coherence, and is required to make Scala programs using implicits for typeclasses understandable and maintainable.

If you want a different instance for the “same” type when working with Scalaz, the answer is always to start with a different type first. Then you can define your incompatible instance for the same structure, but on the different type, preserving coherence. Scalaz’s @@ type tags or the underlying existential newtype mechanism are convenient, flexible ways to get these “different” types.

It’s not surprising that “new type” as a solution to this problem comes from Haskell, as Haskell too depends on global coherence. While we can’t get all the benefits from coherence that Haskell does, what remain is sufficient to justify this seemingly non-modular rule.

The invisible action of implicits in Scala is a serious problem for understanding and maintenance if used in an undisciplined manner. Consider, for example, an incoherent implicit-based design: scala-library’s ExecutionContext.

At any point in the program, the ExecutionContext resolved depends on what variables are in scope. It’s unsafe to move any code that depends on ExecutionContext to some other method, because that set of variables can change, thus changing the behavior of the program. And you can’t determine at a glance whether some code depends on ExecutionContext, so moving any code carries some risk.

You can’t add an ec: ExecutionContext argument anywhere without potentially breaking working code, because it changes that set of variables. It’s only safe to introduce brand-new methods with that argument.

If you are refactoring and suddenly get an error about multiple conflicting ExecutionContexts in scope, you have no help to determine which is the “right” one; you have to figure it out based on the probable intent of the code originally was. Possibly, none of the options in scope is right.

By contrast, consider Monoid[List[N]], where N might be some type variable in scope. The correct instance depends on the type, not what variables are in scope. So you can add a Semigroup[N] constraint and know the answer won’t change. You can split up the method, or move any of its code anywhere else, and know the answer won’t change. You can even add a Monoid[List[N]] argument to your function, because you know the caller is required to come up with the same instance you were working with before.

You can add or delete constraints as required, because they’re always going to be fulfilled by the same instances. For example, Scalaz’s sorted map K ==>> V doesn’t carry the K comparator around, because we can assume it always depends on K and only on K.

If you ever get an error about multiple conflicting instances, you know there’s always a workable solution: choose one, because they’re the same.

Tools try to help with understanding Scala’s implicit resolution, but they don’t always work. Global coherence is your greatest ally when trying to understand what an implicit resolution is doing by hand: you know that any path to the instance you find is a correct one, and you can then work out why that isn’t being resolved.

Global coherence also lets Scalaz offer a much simpler API. For example, the efficient union of sorted maps requires that the ordering of K used for both is equal. With “local” (read: incoherent) instances, the only safe way to do this is to define a singleton type depending on the ordering, and treat this singleton type as a sort of third type parameter to the map type. If you happen to have built two maps with the same key type where you didn’t use polymorphism to unify their use of that singleton type, too bad, you can’t safely union them. With global coherence, because the instance only depends on K, the simple two-parameter map type is perfectly sufficient, and those maps are easy to union.

The “flexibility” of local instances is not worth it given the constraints of Scala, and Scalaz assumes you won’t be using them when you use its functionality. Define a newtype instead.