If you’re interested in design with zero-cost type tagging, or some
cases of AnyVal
I didn’t cover in the first article, or you’re
looking for something else I missed, check here. There’s a lot more I
didn’t have room
for
in the first article. Consider
this “bonus content”.
Unidirectional subst
We saw earlier that though subst
appears to substitute in only one
direction, that direction can easily be reversed. This is due to the
symmetry of type equality—if A = B
, then surely also B = A
.
Suppose that apply
implemented some per-String
validation
logic. In that case, you wouldn’t want users of the Label
API to be
able to circumvent this validation, wholesale; this is easy to do with
the subst
I have shown, and we saw it already when we tagged a whole
list and function, both designed only for plain String
s!
We can get an idea of how to fix this by comparing Leibniz
and
Liskov
. Looking at the signature of Liskov.subst
, you decide to
introduce widen
, replacing subst
.
// in LabelImpl def widen[F[+_]](ft: F[T]): F[String] // in val Label override def widen[F[+_]](ft: F[T]) = ft
With this design, you can untag a tagged list.
scala> Label.widen(taggedList) res0: List[String] = List(hello, world)
You can tag a function that takes an untagged list as parameter.
scala> def report(xs: List[String]): Unit = () report: (xs: List[String])Unit scala> def cwiden[F[-_]](fs: F[String]): F[Label] = Label.widen[Lambda[`+x` => F[x] => F[Label]]](identity)(fs) cwiden: [F[-_]](fs: F[String])F[Label] scala> cwiden[Lambda[`-x` => List[x] => Unit]](report) res1: List[Label] => Unit = $$Lambda$3263/1163097357@7e4f65b7
However, logically, this kind of “tagging” is just a delayed
“untagging” of the T
s involved, so your validation rules are
preserved.
What’s happening? With subst
, we selectively revealed a type
equality. widen
is deliberately less revealing; it selectively
reveals a subtyping relationship, namely, T <: String
.
scala> import scalaz.Liskov, Liskov.<~< scala> Label.widen[Lambda[`+x` => (Label <~< x)]](Liskov.refl) res2: scalaz.Liskov[Label.T,String] = scalaz.Liskov$$anon$3@58e8db18
Cheap tagging with validation
You can think of +
or -
in the signatures of widen
and cwiden
above as a kind of constraint on the F
that those functions take; by
contrast, subst
took any F
without bounds on its argument.
There are other interesting choices of constraint, like Foldable
.
import scalaz.{Failure, Foldable, Success, ValidationNel} import scalaz.syntax.std.option._ import scalaz.syntax.foldable._ // in LabelImpl, alongside def widen: def narrow[F[_]: Foldable](fs: F[String]) : ValidationNel[Err, F[T]] // in val Label override def narrow[F[_]: Foldable](fs: F[String]) = fs.foldMap{string => // return errors if not OK, INil() if OK }.toNel cata (Failure(_), Success(fs))
This is interesting because if you pass anything and get back a
Success
, the succeeding value is just the argument you passed in, no
reallocation necessary. (To reallocate, we would need Traverse
instead of Foldable
.)
Unidirectional without subtyping
If you prefer to avoid subtyping, you can also constrain subst
variants with typeclasses indicating directionality. For Scalaz or
Cats, providing both of these would be a sufficient substitute for the
widen[F[+_]]
introduced above.
def widen[F[_]: Functor](ft: F[T]): F[String] def cwiden[F[_]: Contravariant](fs: F[String]): F[T]
T = String
translucency
subst
and widen
are very powerful, but maybe you’re bothered by the fact
that T
erases to Object
, and you would rather “untagging” happen
automatically.
Thus far, you’ve been selectively revealing aspects of the type
relationship between T
and String
. What if you were to globally
reveal part of it?
To be clear, we must not globally reveal T = String
; then there
would be no usable distinction. But you can reveal weaker properties.
// in LabelImpl type T <: String
Now, widening happens automatically.
scala> taggedList: List[String] res0: List[String] = List(hello, world) scala> report: (List[Label] => Unit) res1: List[Label] => Unit = $$Lambda$3348/1710049434@4320749b
Narrowing is still forbidden; T
and String
are still separate.
scala> (taggedList: List[String]): List[Label] <console>:23: error: type mismatch; found : List[String] required: List[hcavsc.translucent.Labels.Label] (which expands to) List[hcavsc.translucent.Labels.Label.T] (taggedList: List[String]): List[Label] ^
Moreover, erasure looks like AnyVal
subclassing erasure again.
// javap -c -cp target/scala-2.12/classes hcavsc.translucent.MyFirstTests public java.lang.String combineLabels(java.lang.String, java.lang.String);
However, this makes it very difficult for typeclass resolution to
reliably distinguish String
and T
. It’s also easy to accidentally
untag. That’s why we took this out of Scalaz’s Tag
s; discriminating
typeclass instances is a very useful feature of tags. If these aren’t
concerns for you, globally revealed tag subtyping may be the most
convenient for you.
Boxing Int
s
AnyVal
might seem to have better, more justifiable boxing behavior
in the cast of primitive types like Int
. When putting than AnyVal
wrapper around Int
, the custom box replaces the plain Integer
box,
rather than adding another layer.
final class MagicInt(val x: Int) extends AnyVal val x = 42 val y = 84 // javap -c -cp target/scala-2.12/classes hcavsc.intsav.BytecodeTests List(x, y) // skipping some setup bytecode 13: newarray int 15: dup 16: iconst_0 17: iload_1 18: iastore 19: dup 20: iconst_1 21: iload_2 22: iastore 23: invokevirtual #25 // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray; 26: invokevirtual #29 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List; List(new MagicInt(x), new MagicInt(y)) // skipping more setup 37: anewarray #31 // class hcavsc/intsav/MagicInt 40: dup 41: iconst_0 42: new #31 // class hcavsc/intsav/MagicInt 45: dup 46: iload_1 47: invokespecial #35 // Method hcavsc/intsav/MagicInt."<init>":(I)V 50: aastore 51: dup 52: iconst_1 53: new #31 // class hcavsc/intsav/MagicInt 56: dup 57: iload_2 58: invokespecial #35 // Method hcavsc/intsav/MagicInt."<init>":(I)V 61: aastore 62: invokevirtual #39 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray; 65: invokevirtual #29 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;
By contrast, the opaque T
to Integer
when we apply(i: Int):
T
. It then remains in that box until we deliberately get the Int
back.
// MagicInt is defined like Label, // but over Int instead of String val x = MagicInt(42) // javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests 0: getstatic #21 // Field hcavsc/ints/MagicInts$.MODULE$:Lhcavsc/ints/MagicInts$; 3: invokevirtual #25 // Method hcavsc/ints/MagicInts$.MagicInt:()Lhcavsc/ints/MagicInts$MagicIntImpl; 6: bipush 42 8: invokevirtual #29 // Method hcavsc/ints/MagicInts$MagicIntImpl.apply:(I)Ljava/lang/Object; // javap -c -cp target/scala-2.12/classes 'hcavsc.ints.MagicInts$$anon$1' public java.lang.Object apply(int); Code: 0: aload_0 1: iload_1 2: invokevirtual #23 // Method apply:(I)I 5: invokestatic #29 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 8: areturn List(x, x) // skipping setup as before 19: anewarray #4 // class java/lang/Object 22: dup 23: iconst_0 24: aload_1 25: aastore 26: dup 27: iconst_1 28: aload_1 29: aastore 30: invokevirtual #43 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray; 33: invokevirtual #46 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;
While the boxing in the above example happened in MagicInt.apply
,
there’s nothing special about that function’s boxing; the standard
Int
boxing serves just as well.
// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests val xs = List(42) 44: newarray int 46: dup 47: iconst_0 48: bipush 42 50: iastore 51: invokevirtual #50 // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray; 54: invokevirtual #46 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List; 57: astore_2 val mxs = MagicInt.subst(xs) 58: getstatic #21 // Field hcavsc/ints/MagicInts$.MODULE$:Lhcavsc/ints/MagicInts$; 61: invokevirtual #25 // Method hcavsc/ints/MagicInts$.MagicInt:()Lhcavsc/ints/MagicInts$MagicIntImpl; 64: aload_2 65: invokevirtual #54 // Method hcavsc/ints/MagicInts$MagicIntImpl.subst:(Ljava/lang/Object;)Ljava/lang/Object; val y: MagicInt = mxs.head 73: invokevirtual #60 // Method scala/collection/immutable/List.head:()Ljava/lang/Object; 76: astore 4
This is nice for two reasons:
subst
still doesn’t imply any additional boxing beyond what the underlying primitive type implies.- Where the primitive boxing is optimized, you get to keep those
optimizations;
AnyVal
subclass boxing effectively turns off these optimizations. For example,Integer
boxing is optimized, butMagicInt
’sAnyVal
class is not.
The one remaining problem with the tag version of MagicInt
is that
its erasure is still Object
.
def myId(x: MagicInt): MagicInt // javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests public abstract java.lang.Object myId(java.lang.Object);
However, if you
use
the “translucent” variant
where it is always known that type T <: Int
, the erasure is the same
as Int
itself.
// javap -c -cp target/scala-2.12/classes hcavsc.translucentints.OtherTests public abstract int myId(int);
(The boxing/unboxing of MagicInt
changes to match.) Unfortunately,
there’s no way to tell Scala what the erasure ought to be without
exposing that extra type information, which may be quite inconvenient.
Would you box a JavaScript string?
Maybe if we weren’t working with types. Since we are working with types, we don’t have to box our strings in JavaScript in order to keep track of what sort of strings they are. But Scala might want to, anyway.
val x = new Label("hi") js.Array(x, x) // sbt fastOptJS output [new $c_Lhcavsc_av_Label().init___T("hi"), new $c_Lhcavsc_av_Label().init___T("hi")];
Surely it doesn’t have to for our tag-like Label
. And indeed it
doesn’t.
val h = Label("hi") // compiles to var h = "hi"; // fastOptJS is smart enough to know // that apply can be elided val hs = js.Array(h, h) // compiles to var hs = [h, h]; val strs = Label.subst[Lambda[x => js.Array[x] => js.Array[String]]](identity)(hs) strs(0) + strs(1) // compiles to (("" + $as_T(hs[0])) + hs[1]) // fastOptJS is smart enough to know // that subst, too, can be elided
The possible existence of subst
tells us something about the deeper
meaning of our abstract type definition, type T = String
, that holds
true no matter how much of this equality we hide behind existential
layers. It is this: the compiler cannot predict when the fact that T
= String
will be visible, and when it will not be. It must therefore
not generate code that would “go wrong” in contexts where this is
revealed.
For example, at one point, we saw that
Label.subst(Monoid[String])
would yield indeed produce a suitable Monoid[Label]
. This means not
only is the value’s type reinterpreted, but also, by consequence, its
members.
scala> val labelMonoid = Label.subst(Monoid[String]) labelMonoid: scalaz.Monoid[Label.T] = scalaz.std.StringInstances$stringInstance$@6f612117 scala> labelMonoid.zero res0: hcavsc.subst.Labels.Label.T = "" scala> labelMonoid.append _ res1: (Label.T, => Label.T) => Label.T = $$Lambda$3184/987934553@3af2619b
However, in subst
, we have charged the compiler with doing this
arbitrarily complex substitution with 100% accuracy and in constant
time. There are no opportunities to generate “wrappers”, not for these
structures that merely employ Label
in their types. And, by
consequence, there’s nowhere to put code that would use some means to
treat Label
and String
differently based on runtime choices.
If you wish to automatically add “wrappers”, you have a difficult problem already with parametric polymorphism. With higher-kinded types, you have an intractable problem.
Speaking of higher-kinded types…
Type tagging works perfectly well with parameterized types.
type KWConcrete[W, A, B] = Kleisli[(W, ?), A, B] sealed abstract class KWImpl { type T[W, A, B] def subst[F[_[_, _, _]]](fk: F[KWConcrete]): F[T] } val KW: KWImpl = new KWImpl { type T[W, A, B] = KWConcrete[W, A, B] override def subst[F[_[_, _, _]]](fk: F[KWConcrete]) = fk } type KW[W, A, B] = KW.T[W, A, B]
This is nice for a few reasons.
- You can still “add a type parameter” to do abstraction on your tagged types.
- You can hide much of the complexity of a monad transformer stack,
allowing it to infer more
easily
with
Unapply
or-Ypartial-unification
. This is because, unlike standalone type aliases,scalac
can’t dealias your abstraction away. (Warning: this doesn’t apply if you make the typeT
“translucent”; hide your types to keep them safe fromscalac
’s prying expander.) - You can use
subst
to “GND” yourMonad
and other typeclass instances.
implicit def monadKW[W: Monoid, A]: Monad[KW[W, A, ?]] = { type MF[KWC[_, _, _]] = Monad[KWC[W, A, ?]] // KW.subst[MF](implicitly) with better inference KW.subst[MF](Kleisli.kleisliMonadReader[(W, ?), A]) }
“Tagless final effects à la Ermine Writers” develops this kind of type abstraction in another direction.
For the derivation of subst
’s weird signature above,
see
“Higher Leibniz”.
Why is the : LabelImpl
ascription so important?
Suppose that you ignored my comments and defined the concrete
LabelImpl
without an ascription.
val Label = new LabelImpl { // ...implementation continues as before
Then, the abstraction would disappear; you would no longer have a “new type”.
scala> val lbl: Label = "hi" lbl: Label = hi scala> lbl: String res0: String = hi scala> implicitly[Label =:= String] res1: =:=[Label,String] = <function1>
Why did it break so hard? Well, the inferred type of val Label
is
different from the one you were ascribing.
scala> Label res2: LabelImpl{type T = String} = hcavsc.broken.Labels$$anon$1@48cd7b32
That means that Label.T
is no longer existential; it’s known,
and known to be String
. Accordingly, type Label
also expands to
String
, and vice versa.
If you want it a new type, you must keep it existential.
Some background
The unboxed tagging technique is based
on cast-free type tags
in the upcoming Scalaz 7.3.0. That, in turn, was based
on
use of existential types in Ermine's implementation to
hide expansions from scalac
.
This is also a specialization of the type-member based MTL encoding I used in "Tagless final effects à la Ermine Writers". The essential difference is that individual program elements were universally quantified over the expansion of the abstract type, where here, the entire program is universally quantified over that expansion, because the existential quantifier is globally bound.
I’m certainly not the first person to explore this technique; for example, Julian Michael wrote about it several months before this article.
And, of course, if you are an ML (OCaml, SML, &c) fan, you’re probably thinking “yeah, so what? I do this all the time.” Sorry. We can be a little slow on the uptake in Scala world, where we greatly undervalue the ideas of the functional languages before us.
This article was tested with Scala 2.12.1, Scalaz 7.2.10, Scala.js 0.6.13, and Kind Projector 0.9.3. The code is available in compilable form for your own experiments via Bazaar.
2 comments:
I have really enjoyed both articles.
One could be tempted to statically scan Scala source code for bad smells of AnyVal.
This brings the question about what are the good smells of AnyVal?
In particular, do you agree that using AnyVal for a Value class is acceptable since it helps avoid unnecessary runtime allocations as described here: http://docs.scala-lang.org/overviews/core/value-classes.html
It is a good choice for adding methods a la implicit class, or the old style of class + implicit def (implicit class does not work for all cases).
Otherwise, it seems like it is not a good choice.
Post a Comment