The claim of a multi-paradigm language is to harmoniously serve
various approaches to programming. The AnyVal
subclass feature forms
a strong counterargument to Scala’s multiparadigm claim.
AnyVal
subclasses penalize parametric-polymorphic, type-safe
programming, in order to better support type-unsafe programming
styles, such as those making use of isInstanceOf
. They sneakily
shift the blame for their performance problems onto type safety and
polymorphism. I will provide an existence proof that the blame ought
to land squarely on AnyVal
subclasses, but I cannot stop this
blame-shifting from lending further credence
to
the witticism “If scala was the only language I had to think in, I’d think functional programming was a bad idea that didn’t scale, too.”
Moreover, by creating the false impression that the “newtype problem”
has been solved in Scala, AnyVal
subclasses obscure solutions that
better serve polymorphic, type-safe programming. While I describe such
a solution in this article, I have no illusions that I alone can
reverse the upward trend of the AnyVal
meme.
Scala, today, has the potential to better support type-safe
programming, and it has since before the advent of AnyVal
subclasses. In this article, we will focus on how the language could
reveal this potential, becoming a better foundation for polymorphic,
type-safe programming than it advertises today.
Suppose that you want a “wrapper” around String
s with a unique type
so that they can’t be accidentally confused with arbitrary
String
s. This is a common use case for a newtype, a wrapper with
intentionally incompatible type that exists only at compile time. (The
name “newtype” comes from the
Haskell keyword for its version of this feature.)
You decide to use extends AnyVal
, since you have heard that this is
a compile-time-only class that doesn’t get allocated on the heap.
class Label(val str: String) extends AnyVal
object Label {
def apply(s: String): Label =
new Label(s)
}
This seems to do the trick with your first several tests.
class MyFirstTests {
def combineLabels(l: Label, r: Label): Label =
Label(l.str + r.str)
def printLabels(): Unit = {
val fst = Label("hello")
val snd = Label("world")
println(fst.str)
println(snd.str)
}
}
As reported by javap
, the new Label
goes away for Label.apply
.
// javap -c -cp target/scala-2.12/classes hcavsc.av.Label$
public java.lang.String apply(java.lang.String);
Code:
0: aload_1
1: areturn
It vanishes for the signature of combineLabels
too, meaning that we
can write some functions over Label
s without allocating them.
// javap -cp target/scala-2.12/classes hcavsc.av.MyFirstTests
public java.lang.String combineLabels(java.lang.String, java.lang.String);
You can even use Label
in a case class
, and it will be String
at
runtime.
case class Labelled[A](lbl: Label, a: A)
// javap -p -cp target/scala-2.12/classes hcavsc.av.Labelled
private final java.lang.String lbl;
private final A a;
But then, you decide that you want a List
of Label
s.
// add to printLabels
val lbls = List(fst, snd)
// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests
24: iconst_2
25: anewarray #56 // class hcavsc/av/Label
28: dup
29: iconst_0
30: new #56 // class hcavsc/av/Label
33: dup
34: aload_1
35: invokespecial #59 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
38: aastore
39: dup
40: iconst_1
41: new #56 // class hcavsc/av/Label
44: dup
45: aload_2
46: invokespecial #59 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
49: aastore
50: invokevirtual #63 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
53: invokevirtual #66 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;
Huh. Didn’t expect those two new
s to be there. Ah well, maybe now
that they’re in the list,
lbls.map{x => Label(x.str + "Aux")}
// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests
public static final java.lang.Object $anonfun$printLabels$1$adapted(java.lang.Object);
Code:
0: new #61 // class hcavsc/av/Label
3: dup
4: aload_0
5: checkcast #61 // class hcavsc/av/Label
8: invokevirtual #117 // Method hcavsc/av/Label.str:()Ljava/lang/String;
11: invokestatic #119 // Method $anonfun$printLabels$1:(Ljava/lang/String;)Ljava/lang/String;
14: invokespecial #64 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
17: areturn
OK, sure, so you took it out and put it back, so it unboxed and then
boxed again. How about a tuple, instead?
// add to printLabels
(fst, snd)
// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests
73: new #103 // class scala/Tuple2
76: dup
77: new #61 // class hcavsc/av/Label
80: dup
81: aload_1
82: invokespecial #64 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
85: new #61 // class hcavsc/av/Label
88: dup
89: aload_2
90: invokespecial #64 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
93: invokespecial #106 // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)Vf
Two more new
s. Fine. How about the identity
method?
// add to printLabels
identity(fst)
// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests
97: getstatic #59 // Field scala/Predef$.MODULE$:Lscala/Predef$;
100: new #61 // class hcavsc/av/Label
103: dup
104: aload_1
105: invokespecial #64 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
108: invokevirtual #109 // Method scala/Predef$.identity:(Ljava/lang/Object;)Ljava/lang/Object;
So there seems to be an impressive collection of things that will
cause an AnyVal
subclass to box. You assume there’s a good reason
they implemented it this way;
we’ll
get into that later.
However, you decide to look for an alternative newtype
mechanism
that doesn’t box, under the theory that scalac
’s reasons for boxing
AnyVal
subclasses don’t apply to the use cases you have in mind for
Label
and similar things in your codebase.
You have heard that Scalaz’s “type tags” are a kind of newtype with no
boxing. You could just pull in scalaz-core
and see if you can get
them to work, but decide to implement Label
directly using the same
technique as Scalaz tags, instead.
object Labels {
sealed abstract class LabelImpl {
type T
def apply(s: String): T
def unwrap(lbl: T): String
}
// do not forget `: LabelImpl`; it is key
val Label: LabelImpl = new LabelImpl {
type T = String
override def apply(s: String) = s
override def unwrap(lbl: T) = lbl
}
type Label = Label.T
}
import Labels._
While regretting that the compiler no longer makes your Label
type
very convenient to define, you press on. First, to confirm, you can’t
treat an arbitrary String
as a Label
:
scala> "hi there": Label
<console>:15: error: type mismatch;
found : String("hi there")
required: hcavsc.subst.Labels.Label
(which expands to) hcavsc.subst.Labels.Label.T
"hi there": Label
^
So far, so good. Then, why not retry some of the earlier experiments
that caused the AnyVal
-based label to box?
// javap -c -cp target/scala-2.12/classes hcavsc.subst.MyFirstTests
val fst = Label("hello")
val snd = Label("world")
identity(fst)
24: getstatic #43 // Field scala/Predef$.MODULE$:Lscala/Predef$;
27: aload_1
28: invokevirtual #47 // Method scala/Predef$.identity:(Ljava/lang/Object;)Ljava/lang/Object;
(fst, snd)
32: new #49 // class scala/Tuple2
35: dup
36: aload_1
37: aload_2
38: invokespecial #53 // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)V
val lbls = List(fst, snd)
48: iconst_2
49: anewarray #4 // class java/lang/Object
52: dup
53: iconst_0
54: aload_1
55: aastore
56: dup
57: iconst_1
58: aload_2
59: aastore
60: invokevirtual #62 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
63: invokevirtual #65 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;
lbls.map{x => Label(Label.unwrap(x) + "Aux")}
public static final java.lang.Object $anonfun$printLabels$1(java.lang.Object);
Code:
0: getstatic #26 // Field hcavsc/subst/Labels$.MODULE$:Lhcavsc/subst/Labels$;
3: invokevirtual #30 // Method hcavsc/subst/Labels$.Label:()Lhcavsc/subst/Labels$LabelImpl;
6: new #104 // class java/lang/StringBuilder
9: dup
10: invokespecial #106 // Method java/lang/StringBuilder."<init>":()V
13: getstatic #26 // Field hcavsc/subst/Labels$.MODULE$:Lhcavsc/subst/Labels$;
16: invokevirtual #30 // Method hcavsc/subst/Labels$.Label:()Lhcavsc/subst/Labels$LabelImpl;
19: aload_0
20: invokevirtual #110 // Method hcavsc/subst/Labels$LabelImpl.unwrap:(Ljava/lang/Object;)Ljava/lang/String;
23: invokevirtual #114 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
26: ldc #116 // String Aux
28: invokevirtual #114 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
31: invokevirtual #120 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
34: invokevirtual #36 // Method hcavsc/subst/Labels$LabelImpl.apply:(Ljava/lang/String;)Ljava/lang/Object;
37: areturn
No allocation there. Hmm. Well, maybe our concrete LabelImpl
instance is doing some secret boxing?
// javap -c -cp target/scala-2.12/classes 'hcavsc.subst.Labels$$anon$1'
public java.lang.String apply(java.lang.String);
Code:
0: aload_1
1: areturn
public java.lang.Object apply(java.lang.String);
Code:
0: aload_0
1: aload_1
2: invokevirtual #27 // Method apply:(Ljava/lang/String;)Ljava/lang/String;
5: areturn
public java.lang.String unwrap(java.lang.String);
Code:
0: aload_1
1: areturn
public java.lang.String unwrap(java.lang.Object);
Code:
0: aload_0
1: aload_1
2: checkcast #21 // class java/lang/String
5: invokevirtual #23 // Method unwrap:(Ljava/lang/String;)Ljava/lang/String;
8: areturn
No boxing there. That makes sense; in that context, Label
is
String
; the fact that our Label
-using code doesn’t know that is
irrelevant, because we hid that information using existential types.
So, it is possible to have a newtype
mechanism that doesn’t
box. You don’t have to wait for the JVM to
deliver
its own brand of value types; you
can even implement it yourself, in Scala, today. They must have had
another reason for all this boxing, because “we have to because JVM”
is denied by the behavior of Scala-JVM itself.
You aren’t sure what those reasons are, but you decide to port the
rest of your code to use the existential Label
. Befitting an unboxed
newtype, the runtime representation of List[Label]
is exactly the
same as the underlying List[String]
, as well as every Option
,
Either
, and whatever else you can think up.
You notice that the erasure for Label
is different, but this seems
significantly less serious than the boxing problem, so leave it for
now. (We
will
dig into related design decisions later.)
Let’s start with a quick comparison of boxing AnyVal
and the “type
tagging” mechanism we’ve just seen.
Capability |
AnyVal subclass |
Type tag |
Defining methods |
normal override ; virtual method dispatch available |
implicit class enrichment only |
lbl.getClass |
Label |
String |
Cast Any to Label |
checked at runtime |
unchecked; no wrapper left at runtime |
isInstanceOf |
checked at runtime |
unchecked; same reason casting doesn’t work |
Adding type parameters to methods |
boxing/unbox penalty |
no boxing penalty |
Wrapping a List |
O(n): box every element and reallocate list itself |
O(1), with subst : no allocation, output list eq to input list |
Unwrapping a list |
O(n): reallocate list, unbox each element |
O(1): eq output with subst . Also possible to make unwrapping a <: (free liftable automatic upcast) |
Coinductive type class instances |
works; boxing penalty applies |
works; no boxing penalty |
Wrapping whole program parts |
each function must be wrapped to add per-value wrapping/unwrapping |
O(1): just works with subst |
I detect from this matrix a particular theme: AnyVal
subclasses give
up a lot of capability in the type-safe arena. Consider rewriting a
loop that uses Label
as state as a foldLeft
: you must contend with
a new boxing/unboxing penalty, since the state parameter in a
foldLeft
is type-parametric. It’s more fodder for the persistent
higher-order function skeptics among us.
While we know that adding type parameters to our functions improves
type-safety, the skeptic will note the boxing penalty, and attribute
it to parametric polymorphism. But we know the true culprit.
If AnyVal
subclassing taxes type-safe programming in these ways,
what is it spending the money on? Simple: support for isInstanceOf
,
“safe” casting, implementing interfaces, overriding AnyRef
methods
like toString
, and the like.
As type-safe, parametrically-polymorphic programmers, we avoid these
features, as a matter of principle and of practice. Some, like checked
casting, are simply not type-safe. Some ruin free theorems, like
toString
, and we would prefer safe mechanisms, like the Show
typeclass, to actually tell us at compile time if our programs make
sense. Yet, if we use AnyVal
subclasses, we have to pay the price
for all the programmers that wish to write type-unsafe code, like
List[Any] => List[Label]
. All is not well in Multiparadigmatic Land.
To showcase the relationship of the two approaches to
runtime-reflective programming versus statically-proven programming,
let’s consider stringification.
Scala provides the toString
virtual method on Any
. Calling this
method is dynamically resolved on the value itself; it is as if every
value must carry around a pointer to a function that, given itself,
returns a String
. We can define this for our original AnyVal
-based
Label
, and so toString
on List
et al will also work.
// add to class Label
override def toString = s"Label($str)"
scala> List(fst, snd).toString
res0: String = List(Label(hello), Label(world))
scala> Some(fst).toString
res1: String = Some(Label(hello))
Moreover, this “works” even for the type List[Any]
.
scala> List[Any](fst, "hi").toString
res2: String = List(Label(hello), hi)
You cannot override toString
for our fully-erased Label
. After
all, every Label
is just a String
at runtime!
(Different types, same class.)
However, the type-safe programmer will recognize List[Any]
as a type
that, if it occurs in her program, means “something has gone wrong
with this program”. Moreover, because toString
doesn’t make sense
for all types, we use a static mechanism,
like
the scalaz.Show
typeclass. And
this works fine for Label
, because it is statically resolved by
type, not dependent on an implicit runtime member of every Label
; in
fact, it can only work because it is static!
// add to object Labels
import scalaz.Show
implicit val showLabel: Show[Label] =
Show shows {lbl =>
s"Label(${Label.unwrap(lbl)})"}
scala> import scalaz.syntax.show._, scalaz.std.list._,
scalaz.std.option._
scala> List(fst, snd).shows
res1: String = [Label(hello),Label(world)]
scala> some(fst).shows
res2: String = Some(Label(hello))
So if you are doing this kind of programming, it doesn’t matter
whether you can’t override toString
, or type test, &c; you weren’t
doing it anyway. But, aside from a little performance bump, what do
you gain from unboxed type-tagging?
You notice that subst
is at the foundation of several Scalaz
constructs like Leibniz
and Liskov
, and plays a prominent role in
the Tag
API as well. You decide to add this to your LabelImpl
as
well.
// in LabelImpl
def subst[F[_]](fs: F[String]): F[T]
// and in val Label
override def subst[F[_]](fs: F[String]) = fs
It’s interesting that you can use this to tag a whole List[String]
in constant time:
scala> val taggedList = Label.subst(List("hello", "world"))
taggedList: List[Label.T] = List(hello, world)
It’s also interesting that you can use this to untag a whole list
in constant time.
scala> Label.subst[Lambda[x => List[x] => List[String]]](identity)(taggedList)
res0: List[String] = List(hello, world)
Functions and typeclass instance can be tagged or untagged, too.
scala> Label.subst[Lambda[x => (x, Int) => x]](_ substring _)
res1: (Label.T, Int) => Label.T = $$Lambda$3194/964109489@72557d64
scala> import scalaz.Monoid, scalaz.std.string._
scala> Label.subst(Monoid[String])
res3: scalaz.Monoid[Label.T] = scalaz.std.StringInstances$stringInstance$@252798fe
All of this works because subst
is really evidence
that,
deep down, String
and Label
are the same.
scala> import scalaz.Leibniz, Leibniz.{===, refl}
scala> Label.subst[String === ?](refl)
res4: Leibniz[Nothing,Any,String,Label.T] = scalaz.Leibniz$$anon$2@702af12c
Yet, you ran an experiment earlier to prove that you can’t confuse
String
and Label
; indeed, this still holds true, despite the
presence of subst
!
scala> "still a string": Label
<console>:21: error: type mismatch;
found : String("still a string")
required: hcavsc.subst.Labels.Label
(which expands to) hcavsc.subst.Labels.Label.T
"still a string": Label
^
scala> Label("still a label"): String
<console>:21: error: type mismatch;
found : hcavsc.subst.Labels.Label.T
required: String
Label("still a label"): String
^
Here’s what’s happening: in a sense, (new Label(_)): (String =>
Label)
and (_.str): (Label => String)
witness that there’s a
conversion between the two types. subst
witnesses that there’s
identical runtime representation between its own two types. You get
to selectively reveal this evidence when it makes writing your
program more convenient; the rest of the time, it is hidden.
But I would like to step one level up: this is a design space, and
subst
as we have seen it isn’t appropriate for all designs. As the
author of your own abstract newtypes, you get to choose how much, if
any, of this underlying type equality to reveal.
For various reasons, the above is how Scalaz Tag
(@@
) is
defined. If you wish these semantics, you might as well throw
everything else away and write
sealed trait LabelTag // no instances
type Label = String @@ LabelTag
val Label = Tag.of[LabelTag]
and take advantage of the convenient tools around subst
defined in
Tag.Of
. But it’s not the only choice! It’s one point in the design
space. To do right by your API users, it’s
worth
exploring that design space a little more.
Unboxed existential tagging spreads through your codebase. You feel
free to apply it liberally, because you know you aren’t paying the
wrapping costs of AnyVal
subclasses; all these new abstraction
layers are pure type-level, and fully erased.
You receive a “bug report” from a fellow developer that this
expression never seems to filter out the non-label String
s.
(xs: List[Any]).collect{case t: Label => t}
<console>:16: warning: abstract type pattern
hcavsc.translucent.Labels.Label.T
(the underlying of hcavsc.translucent.Labels.Label)
is unchecked since it is eliminated by erasure
(xs: List[Any]).collect{case t: Label => t}
^
<console>:16: warning: The outer reference
in this type test cannot be checked at run time.
(xs: List[Any]).collect{case t: Label => t}
^
Your mind
on
safe pattern matching practice,
you add def unapply(s: String): Option[T]
to LabelImpl
and counsel
preference for the form case Label(t) => ...
, as well as to not
ignore -unchecked
warnings.
You get another bug report that this always seems to succeed.
(s: String).asInstanceOf[Label]
Repeating your advice about warnings, you start to wonder, “where is
this kind of code coming from?”
Someone else complains that they want to make T extends Ordered[T]
,
and can’t fathom where the code should go. You advise the static
approach of implementing the Ordering
typeclass instance instead for
T
, wonder how deep the object-orientation hole goes, and forward the
link about the typeclass pattern again, too.
We’ve seen that AnyVal
subclasses could have been incredibly
cheap, but aren’t, so as to support “features” like checked
casting. Who’s going to foot the bill?
- Oh, this allocates when passing through polymorphic contexts, but
not monomorphic ones? Avoid polymorphic code.
- Oh, this extra type-safety adds all this allocation? Type safety is
expensive at runtime; we need to stick to
String
.
- We can’t do any better; the JVM limits the possibilities. You have
to pay for runtime class wrapping if you want a wrapper type.
In this article, I have demonstrated that none of these conclusions
are correct. However, only a tiny minority of Scala practitioners will
ever read this article, and I will not blame the rest for drawing
these seemingly straightforward inferences, ultimately faulty as they
are.
The real cost of AnyVal
subclasses is not all the needless memory
allocation. The real cost is the damage to the practice of type-safe
programming in Scala. It’s in all the curious developers who sought to
add a little more type safety to their programs, only to find
themselves penalized by the runtime, once bitten. It’s in the
reinforcement of this attitude towards abstraction that they’ll
continue to carry with them, the next time an opportunity presents
itself. It’s a missed opportunity for pure type-level thinking, all so
that asInstanceOf
“works”.
See
“…and the glorious subst
to come” for
further development of the ideas in this article.
This article was tested with Scala 2.12.1, Scalaz 7.2.10, and Kind
Projector 0.9.3. The code
is
available in compilable form for your own experiments via Bazaar.