This article is an expanded version of my LambdaConf 2016 talk of the same name (video, slides). The below covers every topic from that talk, and much more on possible extensions to variance polymorphism, but the talk is a gentler introduction to the main concepts here.
As part of its subtyping system, Scala features variance. Using
variance lets you lift the subtyping relation into type constructors,
just like type equality (~
) already does.
type Endo[A] = A => A // invariant type Get[+A] = Foo => A // covariant type Put[-A] = A => Foo // contravariant X ~ Y → Endo[X] ~ Endo[Y] // invariant X <: Y → Get[X] <: Get[Y] // covariant X <: Y → Put[Y] <: Put[X] // contravariant ↑reversed!↑
Subtyping is incomplete without variance
With simple type equality, you have four properties:
- Reflexivity: A ~ A
- Symmetry: A ~ B → B ~ A
- Transitivity: A ~ B ∧ B ~ C → A ~ C
- Congruence: A ~ B → F[A] ~ F[B]
Just try to use GADTs without equality congruence! That’s what’s expected in a subtyping system without variance.
- Reflexivity: A <: A
- Antisymmetry: A <: B ∧ B <: A → A = B
- Transitivity: A <: B ∧ B <: C → A <: C
- Congruence: A <: B →
Put
[B] <:Put
[A]
Completing subtyping: variables
val aCat = Cat("Audrey") val anAnimal: Animal = aCat
A bare type is in a covariant position. You can’t abstract over something as simple as a value box without variance.
Completing subtyping: the harmony of a function call
def speak(a: Animal): IO[Unit] speak(aCat)
This is how functions and their arguments form a more perfect
union. One way to think of the mechanics here is that Cat
upcasts to
Animal
. But there’s no way to tell that what is really happening
isn’t that the function is upcasting from Animal => IO[Unit]
to
Cat => IO[Unit]
! Or that they aren’t meeting in the middle
somewhere; maybe Cat
upcasts to Mammal
, and Animal => IO[Unit]
upcasts to Mammal => IO[Unit]
.
So I don’t think there’s really subtyping without variance; there’s just failing to explicitly model variance that is there anyway. Since you cannot have subtyping without variance, if variance is too complicated, so is subtyping.
What else is there? What else is needed?
There is one advanced feature that fans of higher-kinded types would have found a deal-breaker to live without in Scala, even if they are unaware of its existence.
def same[A, F[_]](fa: F[A]): F[A] = fa def widen[A, B >: A, F[+_]](fa: F[A]): F[B] = fa
All of Endo
, Get
, and Put
can be passed as the F
type
parameter to same
. However, only Get
can be passed as the F
type
parameter to widen
. This works because you can only “use” variance
if you know you have it, but don’t need to make any assumptions about
your type constructors’ variances if you don’t “use” it.
Variance exhibits a subkinding relation
same
takes an invariant type constructor F
, but you are free to
pass covariant and contravariant type constructors to it. That’s
because there is a subtype relationship between these at the type
level, or a subkind relationship.
Invariance is the ‘top’ variance, the most abstract, and co- and
contravariance are its subvariances. When you pass a covariant or
contravariant type constructor as F
to same
, its variance
“widens”.
Because variance is part of the “type of type constructor”, not
the specific parameter where the variance annotation appears, it’s an
element of the kind of that type constructor, just as arity is. For
example, when we talk about the kind of Get
, we don’t say that A
has a covariant kind, because this variance has nothing to do with
A
. Instead, we say that Get
has kind +* -> *
, because this
particular variance annotation is all about the behavior of types
referring to Get
. Moreover, subvariance is just a restricted flavor
of subkind.
Flipping variances
I started to find it odd that this order of subclassing was enforced as a result of the subvariance relation.
mutable.Seq[A] extends Seq[+A] CovCoy[F[+_]] extends Coy[F[_]]
It makes perfect sense empirically, though; you can easily derive
unsafeCoerce
if you assume this exact ordering isn’t enforced.
Then I found, while working on monad transformers, that this is the only way that makes sense, too.
InvMT[T[_[_]]] extends CovMT[T[_[+_]]] CovMTT[W[_[_[+_]]]] extends InvMTT[W[_[_[_]]]]
I had seen this before.
Type parameter positions are variance-contravariant
The kinds of type constructors (type-level functions) work like the types of value-level functions. Both sorts of functions are contravariant in the parameter position. So every extra layer of nesting, “flips” the variance, just like with functions. Below is the version of this for value-level functions, akin to the examples above.
type One[-A] = A => Z type Two[+A] = (A => Z) => Z type Three[-A] = ((A => Z) => Z) => Z type Four[+A] = (((A => Z) => Z) => Z) => Z
A bottom variance: the diamond, completed
Scala wisely included a bottom type, Nothing
, to go with its top
type Any
. That helps complete its subtyping system, but a bottom
variance was unfortunately left out of its subkinding system. Here’s
something that might work.
type ConstI[👻A] = Int ConstI[A] ~ ConstI[B] // phantom, or 👻
This is exactly what you’d get if you applied both the covariant and contravariant rules to a type parameter. Therefore, phantom variance or anyvariance is more specific than either, and as the ‘bottom’ variance, completes the diamond. It is the perfect choice, because it is truly a greatest lower bound; it is both more specific than either by itself and no more specific than both together.
Whence monad transformer variance?
There are two competing flavors of the monad transformers, like
OptionT
.
final case class NewOptionT[F[_], A](run: F[Option[A]]) final case class OldOptionT[F[+_], +A](run: F[Option[A]])
The first remains invariant over A
, but conveniently doesn’t care
about the variance of F
—its declared variance is the “top”
variance. The latter gives the more specific covariance over A
, but
requires the F
to be covariant (or phantom), which can be very
inconvenient. These two transformers can’t be practically unified.
If you look at the structure, the variance of A
’s position is always
of a piece with the variance of F
.
Variance variables
final case class OptionT[😕V, F[V😕_], V😕A]
- The “variance variable”
V
is declared with the syntactic marker 😕. - 😕 appears infix before
_
to say “F
has the varianceV
for the type parameter in this position”. - 😕 appears infix before
A
to say “the parameterA
has the varianceV
”.
Therefore, when you specify variance V = +
, the remaining parameters
have kind F[+_]
and +A
, and similarly for other variances.
I’ve chosen the confused smiley 😕 to represent likely reactions to
this idea, and especially to my complete lack of consideration for
elegant syntax, but it’s really just a limited form of a kind variable
in
PolyKinds
—after
all, variance is part of the kind of the surrounding type
constructor. And, as seen above, we already have subkind-polymorphism
with respect to variance, so what’s wrong with parametric kind
polymorphism? “Variance variables” are just parametric kind
polymorphism, but not as powerful.
Variance bounds
The OptionT
example is a simple case; it supports all possible
variances, and features a simple relationship: whatever you select for
V
is exactly the variance used at the various places V
appears. It’s easy to break this simple scheme, though.
final case class WrenchT[F[_], A](run: F[Option[A]], wrench: A)
Now the variance of the A
position isn’t strictly the variance of
F
, as it was with OptionT
; it can only be covariant or invariant,
due to the wrench being in covariant position.
Well, we have variables over something with a conformance relation, let’s add bounds!
final case class WrenchT[😕V >: +, F[V😕_], V😕A]
And these bounds themselves could be determined by other variance variables, &c, but I don’t want to dwell on that because the complications aren’t over yet.
No easy unification
final case class Compose[F[_], G[_], A](run: F[G[A]])
This type complicates things even further! There are numerous
possibilities based on the variances of F
and G
.
F,G | G,F | A |
---|---|---|
Inv | Inv | Inv |
Inv | + | Inv |
Inv | - | Inv |
Inv | 👻 | 👻 |
+ | + | + |
+ | - | - |
+ | 👻 | 👻 |
- | - | + |
- | 👻 | 👻 |
👻 | 👻 | 👻 |
(The order of F
and G
doesn’t matter here, so I’ve left out the
reverses here.) So the variance of A
’s position is the result of
multiplying the variance of F
and G
. The multiplication table is
just above. Guess we need a notation for that.
[😕FV, 😕GV, 😕V >: FV × GV, F[FV😕_], G[GV😕_], V😕A]
The bound here only means that V
is in variance-covariant position,
but bounded by FV × GV
.
Another wrench
We can privilege F
a little bit to make things more interesting.
final case class ComposeWr[F[_], G[_], A](run: F[G[A]], fa: F[A])
F×G | F | A |
---|---|---|
Inv,+,- | Inv | |
Inv,+,-,👻 | Inv | Inv |
+,👻 | + | + |
- | + | Inv |
-,👻 | - | - |
+ | - | Inv |
👻 | 👻 | 👻 |
Here’s another level to the function that determines the A
-position
variance: it’s now lub(F×G, F)
, where lub is the least upper
bound, the most specific variance that still holds both arguments as
subvariances.
Variance families? Really?
I hope you guessed where I was going when I said “function”: the rules
determining the lower bound on the A
-position variance can be
specified by the programmer with a variance-level function—a mapping
where the arguments and results are variances—or a variance
family. Again, I don’t think this is terribly novel; it’s just a kind
family, but more restricted.
You’d want it closed and total, and I can see the Haskell now.
variance family LUBTimes a b where LUBTimes - + = - LUBTimes 👻 Inv = 👻 …
Only because I’m not sure where to begin making up the Scala syntax.
There are four variances
When I started looking for ways to describe the variances of the
datatypes I’ve shown you, I noticed the relationship between variance
and kinds, converted the problem to a kind-level problem, and started
thinking of solutions in terms of PolyKinds
. That’s where variance
variables come from, and everything else follows from those.
However, I think I’ve made a mistake. Not with variance variables themselves, mind you, nor the variance-conformance already built in to Scala. But, to deal with the problems that arise with only these, I’ve hypothesized tools—described above—that are way too powerful. They work on open domains of unbounded complexity—the kinds of types—and there are only four variances.
There are two reasons I think there must be a better approach.
First, there are finitely many variance families for a given sort.
Second, there are only so many ways to put variables in positions of variance. You get argument and result position of methods, within other type constructors, and that’s it. Things are simple enough that, discounting desire for working GADTs (which will be discussed in a later post, “Choosing variance for a phantom type”), it is always possible to infer which of the four variances a type parameter ought to have, in a first-order variance situation.
A “good” polyvariance constraint system
Since there are only four variances, and only so many ways to combine them, it might be possible to design something more refined and suited for the task than the too-powerful variance families.
It might be that there are only a few useful variance relations, like × and lub, and a good solution would be to supply these relations along with an expression model to combine them. Or maybe not. Instead, I’ll stop hypothesizing and instead say what a I think a “good” system would look like.
- It must be writable. Just as it is desirable to write a stronger type role than the inferred one in GHC Haskell ≥ 7.8, there are very common reasons to want a more general variance than the one that would be inferred. So the convenience of writing out the rule explicitly matters a great deal.
- It must be checkable. For variance variables, that means every
possible variance you can choose puts every type parameter only in
positions consistent with its variance. For example, our fully
generalized
OptionT
always placesA
only in positions matching the variance of theF
type constructor.We can just check every possible variance—up to four for each variable—but I think this is the wrong way to go. We don’t just enumerate over every possible type to check type parameters—that would take forever—we have a systematic way to check exactly one time, with skolemization. Variance is simpler—it should be an easier problem.
- Not a requirement—but it ought to be inferrable. In the same
way that skolemization gives us a path to automatic generalization,
if there is a similar way to do quantified variance checking, it
should be possible to use the output of that decision procedure to
determine the relationships between and bounds on variance
variables.
How that decision is expressed is another question.
Variance & GHC type roles
It might seem that Haskell, with no subtyping, might not care about this problem. But GHC 7.8 type roles are similar enough to variances; the main difference is that Scala variance is about the congruence/liftability of the strong conformance relation, while type roles are about the congruence/liftability of the weak type equality/“coercibility” relation.
nominal: a ~ b → f a ~ f b representational: a ~w b → f a ~w f b phantom: f a ~w f b
This is pretty useful from a practical, performance-minded perspective, but the problem is
newtype MaybeT m a = MaybeT (m (Maybe a))
there is no way to describe the role of the a
parameter in the most
general way. It’s stuck at nominal
, even if m
’s parameter is
representational
.
Just as the integration of variance and higher kinds in Scala is incomplete without something like the polyvariance system I’ve been describing, Haskell’s type roles are not fully integrated with higher kinds.
I hope if one of these language communities finds a good solution, it is adopted by the other posthaste. The Haskell community is attempting to tackle these problems with roles; perhaps Scala can profit from its innovations. Much of the prose on the GHC matter can be profitably read by replacing “role” with “variance” where it appears. For example, this should sound familiar.
This design incorporates roles into kinds. It solves the exact problems here, but at great cost: because roles are attached to kinds, we have to choose a types roles in the wrong place. For example, consider the
Monad
class. Should the parameterm
have type*/R -> *
, requiring all monads to take representational arguments, or should it have type*/N -> *
, disallowing GND ifjoin
is in theMonad
class? We’re stuck with a different set of problems.
Subtyping or higher kinds?
As things stand, you will have a little trouble combining heavy use of subtyping and of higher kinds in the same system.
I’m not saying for certain that it comes down to one or the other. In Scalaz, we weakened support for subtyping to have better support for higher kinds, because its users typically do not want to use subtyping. However, this preference doesn’t generalize to the Scala community at large. This was only a real concern for monad transformers; most Scalaz constructs, for now, have fine subtyping support.
My suggestion is that you should favor higher kinds; they’re a more powerful abstraction mechanism, and ultimately easier to understand, than subtyping; they also happen to be less buggy in Scala. If you must use subtyping, be warned: it’s much more complex than it first seems.
Further reading
- “Choosing variance for a phantom type” builds upon this post to connect variance and pattern matching
- “Of variance and functors” by Adelbert Chang, provides some foundational intuition for variance
- SI-2066 “Unsoundness in overriding methods with higher‐order type parameters”, easily the most important surfacing of the interaction of higher kinds and variance, revealing the existence of subvariance
- The Liskov type, which builds every other subtyping property on reflexivity and contravariance