Failed Experiments: The missing diamond of Scala variance

This article is an expanded version of my LambdaConf 2016 talk of the same name (video, slides). The below covers every topic from that talk, and much more on possible extensions to variance polymorphism, but the talk is a gentler introduction to the main concepts here.

As part of its subtyping system, Scala features variance. Using variance lets you lift the subtyping relation into type constructors, just like type equality (~) already does.

type Endo[A] = A => A     // invariant
type Get[+A] = Foo => A   // covariant
type Put[-A] = A => Foo   // contravariant

X ~ Y → Endo[X] ~ Endo[Y] // invariant
X <: Y → Get[X] <: Get[Y] // covariant
X <: Y → Put[Y] <: Put[X] // contravariant
             ↑reversed!↑

Subtyping is incomplete without variance

With simple type equality, you have four properties:

Reflexivity: A ~ A
Symmetry: A ~ B → B ~ A
Transitivity: A ~ B ∧ B ~ C → A ~ C
Congruence: A ~ B → F[A] ~ F[B]

Just try to use GADTs without equality congruence! That’s what’s expected in a subtyping system without variance.

Reflexivity: A <: A
Antisymmetry: A <: B ∧ B <: A → A = B
Transitivity: A <: B ∧ B <: C → A <: C
Congruence: A <: B → Put[B] <: Put[A]

Completing subtyping: variables

val aCat = Cat("Audrey")
val anAnimal: Animal = aCat

A bare type is in a covariant position. You can’t abstract over something as simple as a value box without variance.

Completing subtyping: the harmony of a function call

def speak(a: Animal): IO[Unit]
speak(aCat)

This is how functions and their arguments form a more perfect union. One way to think of the mechanics here is that Cat upcasts to Animal. But there’s no way to tell that what is really happening isn’t that the function is upcasting from Animal => IO[Unit] to Cat => IO[Unit]! Or that they aren’t meeting in the middle somewhere; maybe Cat upcasts to Mammal, and Animal => IO[Unit] upcasts to Mammal => IO[Unit].

So I don’t think there’s really subtyping without variance; there’s just failing to explicitly model variance that is there anyway. Since you cannot have subtyping without variance, if variance is too complicated, so is subtyping.

What else is there? What else is needed?

There is one advanced feature that fans of higher-kinded types would have found a deal-breaker to live without in Scala, even if they are unaware of its existence.

def same[A, F[_]](fa: F[A]): F[A] = fa
def widen[A, B >: A, F[+_]](fa: F[A]): F[B] = fa

All of Endo, Get, and Put can be passed as the F type parameter to same. However, only Get can be passed as the F type parameter to widen. This works because you can only “use” variance if you know you have it, but don’t need to make any assumptions about your type constructors’ variances if you don’t “use” it.

Variance exhibits a subkinding relation

same takes an invariant type constructor F, but you are free to pass covariant and contravariant type constructors to it. That’s because there is a subtype relationship between these at the type level, or a subkind relationship.

Invariance is the ‘top’ variance, the most abstract, and co- and contravariance are its subvariances. When you pass a covariant or contravariant type constructor as F to same, its variance “widens”.

Because variance is part of the “type of type constructor”, not the specific parameter where the variance annotation appears, it’s an element of the kind of that type constructor, just as arity is. For example, when we talk about the kind of Get , we don’t say that A has a covariant kind, because this variance has nothing to do with A. Instead, we say that Get has kind +* -> *, because this particular variance annotation is all about the behavior of types referring to Get. Moreover, subvariance is just a restricted flavor of subkind.

Flipping variances

I started to find it odd that this order of subclassing was enforced as a result of the subvariance relation.

mutable.Seq[A] extends Seq[+A]

CovCoy[F[+_]] extends Coy[F[_]]

It makes perfect sense empirically, though; you can easily derive unsafeCoerce if you assume this exact ordering isn’t enforced.

Then I found, while working on monad transformers, that this is the only way that makes sense, too.

InvMT[T[_[_]]] extends CovMT[T[_[+_]]]

CovMTT[W[_[_[+_]]]] extends InvMTT[W[_[_[_]]]]

I had seen this before.

Type parameter positions are variance-contravariant

The kinds of type constructors (type-level functions) work like the types of value-level functions. Both sorts of functions are contravariant in the parameter position. So every extra layer of nesting, “flips” the variance, just like with functions. Below is the version of this for value-level functions, akin to the examples above.

type One[-A] = A => Z
type Two[+A] = (A => Z) => Z
type Three[-A] = ((A => Z) => Z) => Z
type Four[+A] = (((A => Z) => Z) => Z) => Z

A bottom variance: the diamond, completed

Scala wisely included a bottom type, Nothing, to go with its top type Any. That helps complete its subtyping system, but a bottom variance was unfortunately left out of its subkinding system. Here’s something that might work.

type ConstI[👻A] = Int

ConstI[A] ~ ConstI[B] // phantom, or 👻

This is exactly what you’d get if you applied both the covariant and contravariant rules to a type parameter. Therefore, phantom variance or anyvariance is more specific than either, and as the ‘bottom’ variance, completes the diamond. It is the perfect choice, because it is truly a greatest lower bound; it is both more specific than either by itself and no more specific than both together.

Whence monad transformer variance?

There are two competing flavors of the monad transformers, like OptionT.

final case class NewOptionT[F[_],   A](run: F[Option[A]])

final case class OldOptionT[F[+_], +A](run: F[Option[A]])

The first remains invariant over A, but conveniently doesn’t care about the variance of F—its declared variance is the “top” variance. The latter gives the more specific covariance over A, but requires the F to be covariant (or phantom), which can be very inconvenient. These two transformers can’t be practically unified.

If you look at the structure, the variance of A’s position is always of a piece with the variance of F.

Variance variables

final case class OptionT[😕V, F[V😕_], V😕A]

The “variance variable” V is declared with the syntactic marker 😕.
😕 appears infix before _ to say “F has the variance V for the type parameter in this position”.
😕 appears infix before A to say “the parameter A has the variance V”.

Therefore, when you specify variance V = +, the remaining parameters have kind F[+_] and +A, and similarly for other variances.

I’ve chosen the confused smiley 😕 to represent likely reactions to this idea, and especially to my complete lack of consideration for elegant syntax, but it’s really just a limited form of a kind variable in PolyKinds—after all, variance is part of the kind of the surrounding type constructor. And, as seen above, we already have subkind-polymorphism with respect to variance, so what’s wrong with parametric kind polymorphism? “Variance variables” are just parametric kind polymorphism, but not as powerful.

Variance bounds

The OptionT example is a simple case; it supports all possible variances, and features a simple relationship: whatever you select for V is exactly the variance used at the various places V appears. It’s easy to break this simple scheme, though.

final case class WrenchT[F[_], A](run: F[Option[A]], wrench: A)

Now the variance of the A position isn’t strictly the variance of F, as it was with OptionT; it can only be covariant or invariant, due to the wrench being in covariant position.

Well, we have variables over something with a conformance relation, let’s add bounds!

final case class WrenchT[😕V >: +, F[V😕_], V😕A]

And these bounds themselves could be determined by other variance variables, &c, but I don’t want to dwell on that because the complications aren’t over yet.

No easy unification

final case class Compose[F[_], G[_], A](run: F[G[A]])

This type complicates things even further! There are numerous possibilities based on the variances of F and G.

F,GG,FA  
InvInvInv
Inv+  Inv
Inv-  Inv
Inv👻  👻  
+  +  +  
+  -  -  
+  👻  👻  
-  -  +  
-  👻  👻  
👻  👻  👻  

F,G	G,F	A
Inv	Inv	Inv
Inv	+	Inv
Inv	-	Inv
Inv	👻	👻
+	+	+
+	-	-
+	👻	👻
-	-	+
-	👻	👻
👻	👻	👻

(The order of F and G doesn’t matter here, so I’ve left out the reverses here.) So the variance of A’s position is the result of multiplying the variance of F and G. The multiplication table is just above. Guess we need a notation for that.

[😕FV, 😕GV, 😕V >: FV × GV, F[FV😕_], G[GV😕_], V😕A]

The bound here only means that V is in variance-covariant position, but bounded by FV × GV.

Another wrench

We can privilege F a little bit to make things more interesting.

final case class ComposeWr[F[_], G[_], A](run: F[G[A]], fa: F[A])

F×G	F	A
	Inv,+,-	Inv
Inv,+,-,👻	Inv	Inv
+,👻	+	+
-	+	Inv
-,👻	-	-
+	-	Inv
👻	👻	👻

Here’s another level to the function that determines the A-position variance: it’s now lub(F×G, F), where lub is the least upper bound, the most specific variance that still holds both arguments as subvariances.

Variance families? Really?

I hope you guessed where I was going when I said “function”: the rules determining the lower bound on the A-position variance can be specified by the programmer with a variance-level function—a mapping where the arguments and results are variances—or a variance family. Again, I don’t think this is terribly novel; it’s just a kind family, but more restricted.

You’d want it closed and total, and I can see the Haskell now.

variance family LUBTimes a b where
  LUBTimes - + = -
  LUBTimes 👻 Inv = 👻
  …

Only because I’m not sure where to begin making up the Scala syntax.

There are four variances

When I started looking for ways to describe the variances of the datatypes I’ve shown you, I noticed the relationship between variance and kinds, converted the problem to a kind-level problem, and started thinking of solutions in terms of PolyKinds. That’s where variance variables come from, and everything else follows from those.

However, I think I’ve made a mistake. Not with variance variables themselves, mind you, nor the variance-conformance already built in to Scala. But, to deal with the problems that arise with only these, I’ve hypothesized tools—described above—that are way too powerful. They work on open domains of unbounded complexity—the kinds of types—and there are only four variances.

There are two reasons I think there must be a better approach.

First, there are finitely many variance families for a given sort.

Second, there are only so many ways to put variables in positions of variance. You get argument and result position of methods, within other type constructors, and that’s it. Things are simple enough that, discounting desire for working GADTs (which will be discussed in a later post, “Choosing variance for a phantom type”), it is always possible to infer which of the four variances a type parameter ought to have, in a first-order variance situation.

A “good” polyvariance constraint system

Since there are only four variances, and only so many ways to combine them, it might be possible to design something more refined and suited for the task than the too-powerful variance families.

It might be that there are only a few useful variance relations, like × and lub, and a good solution would be to supply these relations along with an expression model to combine them. Or maybe not. Instead, I’ll stop hypothesizing and instead say what a I think a “good” system would look like.

It must be writable. Just as it is desirable to write a stronger type role than the inferred one in GHC Haskell ≥ 7.8, there are very common reasons to want a more general variance than the one that would be inferred. So the convenience of writing out the rule explicitly matters a great deal.
It must be checkable. For variance variables, that means every possible variance you can choose puts every type parameter only in positions consistent with its variance. For example, our fully generalized OptionT always places A only in positions matching the variance of the F type constructor.
We can just check every possible variance—up to four for each variable—but I think this is the wrong way to go. We don’t just enumerate over every possible type to check type parameters—that would take forever—we have a systematic way to check exactly one time, with skolemization. Variance is simpler—it should be an easier problem.
Not a requirement—but it ought to be inferrable. In the same way that skolemization gives us a path to automatic generalization, if there is a similar way to do quantified variance checking, it should be possible to use the output of that decision procedure to determine the relationships between and bounds on variance variables.
How that decision is expressed is another question.

Variance & GHC type roles

It might seem that Haskell, with no subtyping, might not care about this problem. But GHC 7.8 type roles are similar enough to variances; the main difference is that Scala variance is about the congruence/liftability of the strong conformance relation, while type roles are about the congruence/liftability of the weak type equality/“coercibility” relation.

nominal:          a ~ b → f a ~ f b
representational: a ~_w b → f a ~_w f b
phantom:          f a ~_w f b

This is pretty useful from a practical, performance-minded perspective, but the problem is

newtype MaybeT m a = MaybeT (m (Maybe a))

there is no way to describe the role of the a parameter in the most general way. It’s stuck at nominal, even if m’s parameter is representational.

Just as the integration of variance and higher kinds in Scala is incomplete without something like the polyvariance system I’ve been describing, Haskell’s type roles are not fully integrated with higher kinds.

I hope if one of these language communities finds a good solution, it is adopted by the other posthaste. The Haskell community is attempting to tackle these problems with roles; perhaps Scala can profit from its innovations. Much of the prose on the GHC matter can be profitably read by replacing “role” with “variance” where it appears. For example, this should sound familiar.

This design incorporates roles into kinds. It solves the exact problems here, but at great cost: because roles are attached to kinds, we have to choose a types roles in the wrong place. For example, consider the Monad class. Should the parameter m have type */R -> *, requiring all monads to take representational arguments, or should it have type */N -> *, disallowing GND if join is in the Monad class? We’re stuck with a different set of problems.

Subtyping or higher kinds?

As things stand, you will have a little trouble combining heavy use of subtyping and of higher kinds in the same system.

I’m not saying for certain that it comes down to one or the other. In Scalaz, we weakened support for subtyping to have better support for higher kinds, because its users typically do not want to use subtyping. However, this preference doesn’t generalize to the Scala community at large. This was only a real concern for monad transformers; most Scalaz constructs, for now, have fine subtyping support.

My suggestion is that you should favor higher kinds; they’re a more powerful abstraction mechanism, and ultimately easier to understand, than subtyping; they also happen to be less buggy in Scala. If you must use subtyping, be warned: it’s much more complex than it first seems.

Failed Experiments

Friday, September 16, 2016

The missing diamond of Scala variance