*This article is an expanded version of my LambdaConf 2016 talk of the
same name
(video, slides). The
below covers every topic from that talk, and much more on possible
extensions to variance polymorphism, but the talk is a gentler
introduction to the main concepts here.*

As part of its subtyping system, Scala features *variance*. Using
variance lets you lift the subtyping relation into type constructors,
just like type equality (`~`

) already does.

type Endo[A] = A => A // invariant type Get[+A] = Foo => A // covariant type Put[-A] = A => Foo // contravariant X ~ Y → Endo[X] ~ Endo[Y] // invariant X <: Y → Get[X] <: Get[Y] // covariant X <: Y → Put[Y] <: Put[X] // contravariant ↑reversed!↑

## Subtyping is incomplete without variance

With simple type equality, you have four properties:

- Reflexivity:
*A*~*A* - Symmetry:
*A*~*B*→*B*~*A* - Transitivity:
*A*~*B*∧*B*~*C*→*A*~*C* - Congruence:
*A*~*B*→*F*[*A*] ~*F*[*B*]

Just try to use GADTs without equality congruence! That’s what’s expected in a subtyping system without variance.

- Reflexivity:
*A*<:*A* - Antisymmetry:
*A*<:*B*∧*B*<:*A*→*A*=*B* - Transitivity:
*A*<:*B*∧*B*<:*C*→*A*<:*C* - Congruence:
*A*<:*B*→`Put`

[*B*] <:`Put`

[*A*]

### Completing subtyping: variables

val aCat = Cat("Audrey") val anAnimal: Animal = aCat

A bare type is in a covariant position. You can’t abstract over something as simple as a value box without variance.

### Completing subtyping: the harmony of a function call

def speak(a: Animal): IO[Unit] speak(aCat)

This is how functions and their arguments form a more perfect
union. One way to think of the mechanics here is that `Cat`

upcasts to
`Animal`

. But there’s no way to tell that what is really happening
isn’t that the *function* is upcasting from `Animal => IO[Unit]`

to
`Cat => IO[Unit]`

! Or that they aren’t meeting in the middle
somewhere; maybe `Cat`

upcasts to `Mammal`

, and `Animal => IO[Unit]`

upcasts to `Mammal => IO[Unit]`

.

So I don’t think there’s really subtyping without variance; there’s just failing to explicitly model variance that is there anyway. Since you cannot have subtyping without variance, if variance is too complicated, so is subtyping.

## What else is there? What else is needed?

There is one advanced feature that fans of higher-kinded types would have found a deal-breaker to live without in Scala, even if they are unaware of its existence.

def same[A, F[_]](fa: F[A]): F[A] = fa def widen[A, B >: A, F[+_]](fa: F[A]): F[B] = fa

All of `Endo`

, `Get`

, and `Put`

can be passed as the `F`

type
parameter to `same`

. However, only `Get`

can be passed as the `F`

type
parameter to `widen`

. This works because you can only “use” variance
if you know you have it, but don’t need to make any assumptions about
your type constructors’ variances if you don’t “use” it.

## Variance exhibits a subkinding relation

`same`

takes an invariant type constructor `F`

, but you are free to
pass covariant and contravariant type constructors to it. That’s
because there is a subtype relationship between these at the type
level, or a *subkind* relationship.

Invariance is the ‘top’ variance, the most abstract, and co- and
contravariance are its *subvariances*. When you pass a covariant or
contravariant type constructor as `F`

to `same`

, its variance
“widens”.

Because **variance is part of the “type of type constructor”**, not
the specific parameter where the variance annotation appears, it’s an
element of the kind of that type constructor, just as arity is. For
example, when we talk about the kind of `Get`

, we don’t say that `A`

has a covariant kind, because this variance has nothing to do with
`A`

. Instead, we say that `Get`

has kind `+* -> *`

, because this
particular variance annotation is all about the behavior of types
referring to `Get`

. Moreover, subvariance is just a restricted flavor
of subkind.

### Flipping variances

I started to find it odd that this order of subclassing was enforced as a result of the subvariance relation.

mutable.Seq[A] extends Seq[+A] CovCoy[F[+_]] extends Coy[F[_]]

It makes perfect sense empirically, though; you can easily derive
`unsafeCoerce`

if you assume this exact ordering isn’t enforced.

Then I found, while working on monad transformers, that *this* is the
only way that makes sense, too.

InvMT[T[_[_]]] extends CovMT[T[_[+_]]] CovMTT[W[_[_[+_]]]] extends InvMTT[W[_[_[_]]]]

I had seen this before.

### Type parameter positions are variance-contravariant

The kinds of type constructors (type-level functions) work like the types of value-level functions. Both sorts of functions are contravariant in the parameter position. So every extra layer of nesting, “flips” the variance, just like with functions. Below is the version of this for value-level functions, akin to the examples above.

type One[-A] = A => Z type Two[+A] = (A => Z) => Z type Three[-A] = ((A => Z) => Z) => Z type Four[+A] = (((A => Z) => Z) => Z) => Z

### A bottom variance: the diamond, completed

Scala wisely included a bottom type, `Nothing`

, to go with its top
type `Any`

. That helps complete its subtyping system, but a bottom
variance was unfortunately left out of its subkinding system. Here’s
something that might work.

type ConstI[ðŸ‘»A] = Int ConstI[A] ~ ConstI[B] // phantom, or ðŸ‘»

This is exactly what you’d get if you applied both the covariant and
contravariant rules to a type parameter. Therefore, *phantom variance*
or *anyvariance* is more specific than either, and as the ‘bottom’
variance, completes the diamond. It is the perfect choice, because it
is truly a *greatest lower bound*; it is both more specific than
either by itself and *no more* specific than both together.

## Whence monad transformer variance?

There are two competing flavors of the monad transformers, like
`OptionT`

.

final case class NewOptionT[F[_], A](run: F[Option[A]]) final case class OldOptionT[F[+_], +A](run: F[Option[A]])

The first remains invariant over `A`

, but conveniently doesn’t care
about the variance of `F`

—its declared variance is the “top”
variance. The latter gives the more specific covariance over `A`

, but
requires the `F`

to be covariant (or phantom), which can be very
inconvenient. These two transformers can’t be practically unified.

If you look at the structure, the variance of `A`

’s position is always
of a piece with the variance of `F`

.

### Variance variables

final case class OptionT[ðŸ˜•V, F[VðŸ˜•_], VðŸ˜•A]

- The “variance variable”
`V`

is declared with the syntactic marker ðŸ˜•. - ðŸ˜• appears infix before
`_`

to say “`F`

has the variance`V`

for the type parameter in this position”. - ðŸ˜• appears infix before
`A`

to say “the parameter`A`

has the variance`V`

”.

Therefore, when you specify variance `V = +`

, the remaining parameters
have kind `F[+_]`

and `+A`

, and similarly for other variances.

I’ve chosen the confused smiley ðŸ˜• to represent likely reactions to
this idea, and especially to my complete lack of consideration for
elegant syntax, but it’s really just a limited form of a kind variable
in
`PolyKinds`

—after
all, variance is part of the kind of the surrounding type
constructor. And, as seen above, we already have subkind-polymorphism
with respect to variance, so what’s wrong with parametric kind
polymorphism? “Variance variables” are just parametric kind
polymorphism, but not as powerful.

### Variance bounds

The `OptionT`

example is a simple case; it supports all possible
variances, and features a simple relationship: whatever you select for
`V`

is exactly the variance used at the various places `V`

appears. It’s easy to break this simple scheme, though.

final case class WrenchT[F[_], A](run: F[Option[A]], wrench: A)

Now the variance of the `A`

position isn’t strictly the variance of
`F`

, as it was with `OptionT`

; it can only be covariant or invariant,
due to the wrench being in covariant position.

Well, we have variables over something with a conformance relation, let’s add bounds!

final case class WrenchT[ðŸ˜•V >: +, F[VðŸ˜•_], VðŸ˜•A]

And these bounds themselves could be determined by other variance variables, &c, but I don’t want to dwell on that because the complications aren’t over yet.

## No easy unification

final case class Compose[F[_], G[_], A](run: F[G[A]])

This type complicates things even further! There are numerous
possibilities based on the variances of `F`

and `G`

.

F,G | G,F | A |
---|---|---|

Inv | Inv | Inv |

Inv | + | Inv |

Inv | - | Inv |

Inv | ðŸ‘» | ðŸ‘» |

+ | + | + |

+ | - | - |

+ | ðŸ‘» | ðŸ‘» |

- | - | + |

- | ðŸ‘» | ðŸ‘» |

ðŸ‘» | ðŸ‘» | ðŸ‘» |

(The order of `F`

and `G`

doesn’t matter here, so I’ve left out the
reverses here.) So the variance of `A`

’s position is the result of
*multiplying the variance* of `F`

and `G`

. The multiplication table is
just above. Guess we need a notation for that.

[ðŸ˜•FV, ðŸ˜•GV, ðŸ˜•V >: FV × GV, F[FVðŸ˜•_], G[GVðŸ˜•_], VðŸ˜•A]

The bound here only means that `V`

is in variance-covariant position,
but bounded by `FV × GV`

.

### Another wrench

We can privilege `F`

a little bit to make things more interesting.

final case class ComposeWr[F[_], G[_], A](run: F[G[A]], fa: F[A])

F×G | F | A |
---|---|---|

Inv,+,- | Inv | |

Inv,+,-,ðŸ‘» | Inv | Inv |

+,ðŸ‘» | + | + |

- | + | Inv |

-,ðŸ‘» | - | - |

+ | - | Inv |

ðŸ‘» | ðŸ‘» | ðŸ‘» |

Here’s another level to the function that determines the `A`

-position
variance: it’s now `lub(F×G, F)`

, where *lub* is the *least upper
bound*, the most specific variance that still holds both arguments as
subvariances.

### Variance families? Really?

I hope you guessed where I was going when I said “function”: the rules
determining the lower bound on the `A`

-position variance can be
specified by the programmer with a variance-level function—a mapping
where the arguments and results are variances—or a *variance
family*. Again, I don’t think this is terribly novel; it’s just a kind
family, but more restricted.

You’d want it closed and total, and I can see the Haskell now.

variance family LUBTimes a b where LUBTimes - + = - LUBTimes ðŸ‘» Inv = ðŸ‘» …

Only because I’m not sure where to begin making up the Scala syntax.

## There are four variances

When I started looking for ways to describe the variances of the
datatypes I’ve shown you, I noticed the relationship between variance
and kinds, converted the problem to a kind-level problem, and started
thinking of solutions in terms of `PolyKinds`

. That’s where variance
variables come from, and everything else follows from those.

However, I think I’ve made a mistake. Not with variance variables themselves, mind you, nor the variance-conformance already built in to Scala. But, to deal with the problems that arise with only these, I’ve hypothesized tools—described above—that are way too powerful. They work on open domains of unbounded complexity—the kinds of types—and there are only four variances.

There are two reasons I think there must be a better approach.

First, there are finitely many variance families for a given sort.

Second, there are only so many ways to put variables in positions of
variance. You get argument and result position of methods, within
other type constructors, and that’s it. Things are simple enough that,
discounting desire for working GADTs (which will be discussed in a
later post, “Choosing variance for a phantom type”), it is always
possible to *infer* which of the four variances a type parameter ought
to have, in a first-order variance situation.

### A “good” polyvariance constraint system

Since there are only four variances, and only so many ways to combine them, it might be possible to design something more refined and suited for the task than the too-powerful variance families.

It might be that there are only a few useful variance relations, like × and lub, and a good solution would be to supply these relations along with an expression model to combine them. Or maybe not. Instead, I’ll stop hypothesizing and instead say what a I think a “good” system would look like.

- It must be
**writable**. Just as it is desirable to write a stronger type role than the inferred one in GHC Haskell ≥ 7.8, there are very common reasons to want a more general variance than the one that would be inferred. So the convenience of writing out the rule explicitly matters a great deal. - It must be
**checkable**. For variance variables, that means every possible variance you can choose puts every type parameter only in positions consistent with its variance. For example, our fully generalized`OptionT`

always places`A`

only in positions matching the variance of the`F`

type constructor.We

*can*just check every possible variance—up to four for each variable—but I think this is the wrong way to go. We don’t just enumerate over every possible type to check type parameters—that would take forever—we have a systematic way to check exactly one time, with skolemization. Variance is simpler—it should be an easier problem. - Not a requirement—but it ought to be
**inferrable**. In the same way that skolemization gives us a path to automatic generalization, if there is a similar way to do quantified variance checking, it should be possible to use the output of that decision procedure to determine the relationships between and bounds on variance variables.How that decision is expressed is another question.

## Variance & GHC type roles

It might seem that Haskell, with no subtyping, might not care about
this problem. But GHC 7.8
*type roles* are
similar enough to variances; the main difference is that Scala
variance is about the congruence/liftability of the
strong conformance relation,
while type roles are about the congruence/liftability of the
weak type equality/“coercibility” relation.

nominal: a ~ b → f a ~ f b representational: a ~_{w}b → f a ~_{w}f b phantom: f a ~_{w}f b

This is pretty useful from a practical, performance-minded perspective, but the problem is

newtype MaybeT m a = MaybeT (m (Maybe a))

there is no way to describe the role of the `a`

parameter in the most
general way. It’s stuck at `nominal`

, even if `m`

’s parameter is
`representational`

.

Just as the integration of variance and higher kinds in Scala is incomplete without something like the polyvariance system I’ve been describing, Haskell’s type roles are not fully integrated with higher kinds.

I hope if one of these language communities finds a good solution, it is adopted by the other posthaste. The Haskell community is attempting to tackle these problems with roles; perhaps Scala can profit from its innovations. Much of the prose on the GHC matter can be profitably read by replacing “role” with “variance” where it appears. For example, this should sound familiar.

This design incorporates roles into kinds. It solves the exact problems here, but at great cost: because roles are attached to kinds, we have to choose a types roles in the wrong place. For example, consider the

`Monad`

class. Should the parameter`m`

have type`*/R -> *`

, requiring all monads to take representational arguments, or should it have type`*/N -> *`

, disallowing GND if`join`

is in the`Monad`

class? We’re stuck with a different set of problems.

## Subtyping or higher kinds?

As things stand, you will have a little trouble combining heavy use of subtyping and of higher kinds in the same system.

I’m not saying for certain that it comes down to one or the other. In Scalaz, we weakened support for subtyping to have better support for higher kinds, because its users typically do not want to use subtyping. However, this preference doesn’t generalize to the Scala community at large. This was only a real concern for monad transformers; most Scalaz constructs, for now, have fine subtyping support.

My suggestion is that you should favor higher kinds; they’re a more powerful abstraction mechanism, and ultimately easier to understand, than subtyping; they also happen to be less buggy in Scala. If you must use subtyping, be warned: it’s much more complex than it first seems.

## Further reading

- “Choosing variance for a phantom type” builds upon this post to connect variance and pattern matching
- “Of variance and functors” by Adelbert Chang, provides some foundational intuition for variance
- SI-2066 “Unsoundness in overriding methods with higher‐order type parameters”, easily the most important surfacing of the interaction of higher kinds and variance, revealing the existence of subvariance
- The Liskov type, which builds every other subtyping property on reflexivity and contravariance