Saturday, April 15, 2017

Why I didn't sign the Scala CLA

I wrote this shortly after I opted not to sign the Scala CLA in 2015. Since Scala still requires a CLA in its contribution process, and even contributing to Typelevel Scala effectively requires assent to the same unjust mechanism, I have decided to publish it at last.

One of the most important advantages of Free, Open Source Software (FOSS) is that it returns power to the community of users. With proprietary software, power is always concentrated in the hands of the maintainer, i.e. the copyright holder.

The [more] equal status of maintainer and user in FOSS creates a natural check. It keeps honest, well-intentioned maintainers honest, and permits the community to reform around new maintainership should a formerly good situation change. And circumstances can always change.

This equal status does not fall out of the sky; it is mediated by a legal constitution: the license(s) of the software and documentation developed by the project. When users accept the license terms—by redistributing the code or changes thereto—they agree to this constitution. When maintainers accept contributions under that license, as in an ordinary CLA-less project, under inbound=outbound, they agree to the very same constitution as the users.

A project with a CLA or ©AA is different. There is one legal constitution for the users, and one for the maintainers. This arrangement always privileges the maintainers by

  1. removing privileges from the users and reserving them for the maintainers, and
  2. removing risk from the maintainers and reserving it for the users.

Despite fine words in the Scala CLA about “being for your protection as well as ours” (to paraphrase), the terms that follow are, with few exceptions, utterly and unapologetically nonreciprocal.

I believe this situation is acceptable in some cases; the only such agreements I have signed without regret are with the FSF. But no CLA or ©AA I have ever seen makes the strong reciprocal promises that the FSF does, and it is anyway unreasonable to expect any contributor to so carefully evaluate the likely future behavior of each organization maintaining some software they might like to contribute to. For myself, I decided that, given my past regrets, and the degree to which EPFL’s agreement transfers power to its own hands and risk back to the contributors’, there was no way I would come to trust EPFL sufficiently to sign.

This is not to say that EPFL would be an ill-behaved caretaker! But by what means could I make that determination? Moreover, why is it even necessary?

The closest thing to an acceptable rationale for the Scala CLA is that it addresses legal concerns left unmentioned by the license, e.g. patent grants. These are important concerns, too frequently unaddressed by projects using minimalist licenses such as Scala uses. But the appropriate place to do this is to address these concerns in the basic legal constitution for all: the license. If these guarantees are so important that EPFL must have them, then why should we, as contributors, not ask them of EPFL, via inbound=outbound? If these terms would make the license “too complex”, no longer minimal, what about their placement in a CLA will make them any better understood?

It’s my hope that Scala will abandon the CLA, and switch to a lightweight option that holds true to the principles of FOSS projects. A couple options are

  1. A formal license-assent-only mechanism, like Selenium’s.
  2. A Developer Certificate of Origin, like the Linux kernel.

This may or may not be coupled with the switch to a longer license that incorporates stronger patent protections, like Apache License 2.0. This should alleviate the concerns that are currently addressed by the CLA, but in a way that is equitable to the Scala project, all of its contributors, and all of its users.

Sunday, April 9, 2017

...and the glorious subst to come

If you’re interested in design with zero-cost type tagging, or some cases of AnyVal I didn’t cover in the first article, or you’re looking for something else I missed, check here. There’s a lot more I didn’t have room for in the first article. Consider this “bonus content”.

Unidirectional subst

We saw earlier that though subst appears to substitute in only one direction, that direction can easily be reversed. This is due to the symmetry of type equality—if A = B, then surely also B = A.

Suppose that apply implemented some per-String validation logic. In that case, you wouldn’t want users of the Label API to be able to circumvent this validation, wholesale; this is easy to do with the subst I have shown, and we saw it already when we tagged a whole list and function, both designed only for plain Strings!

We can get an idea of how to fix this by comparing Leibniz and Liskov. Looking at the signature of Liskov.subst, you decide to introduce widen, replacing subst.

// in LabelImpl
  def widen[F[+_]](ft: F[T]): F[String]

// in val Label
  override def widen[F[+_]](ft: F[T]) = ft

With this design, you can untag a tagged list.

scala> Label.widen(taggedList)
res0: List[String] = List(hello, world)

You can tag a function that takes an untagged list as parameter.

scala> def report(xs: List[String]): Unit = ()
report: (xs: List[String])Unit

scala> def cwiden[F[-_]](fs: F[String]): F[Label] =
         Label.widen[Lambda[`+x` => F[x] => F[Label]]](identity)(fs)
cwiden: [F[-_]](fs: F[String])F[Label]

scala> cwiden[Lambda[`-x` => List[x] => Unit]](report)
res1: List[Label] => Unit = $$Lambda$3263/1163097357@7e4f65b7

However, logically, this kind of “tagging” is just a delayed “untagging” of the Ts involved, so your validation rules are preserved.

What’s happening? With subst, we selectively revealed a type equality. widen is deliberately less revealing; it selectively reveals a subtyping relationship, namely, T <: String.

scala> import scalaz.Liskov, Liskov.<~<

scala> Label.widen[Lambda[`+x` => (Label <~< x)]](Liskov.refl)
res2: scalaz.Liskov[Label.T,String] = scalaz.Liskov$$anon$3@58e8db18

Cheap tagging with validation

You can think of + or - in the signatures of widen and cwiden above as a kind of constraint on the F that those functions take; by contrast, subst took any F without bounds on its argument.

There are other interesting choices of constraint, like Foldable.

import scalaz.{Failure, Foldable, Success, ValidationNel}
import scalaz.syntax.std.option._
import scalaz.syntax.foldable._

// in LabelImpl, alongside def widen:
  def narrow[F[_]: Foldable](fs: F[String])
    : ValidationNel[Err, F[T]]

// in val Label
  override def narrow[F[_]: Foldable](fs: F[String]) =
    fs.foldMap{string =>
      // return errors if not OK, INil() if OK
    }.toNel cata (Failure(_), Success(fs))

This is interesting because if you pass anything and get back a Success, the succeeding value is just the argument you passed in, no reallocation necessary. (To reallocate, we would need Traverse instead of Foldable.)

Unidirectional without subtyping

If you prefer to avoid subtyping, you can also constrain subst variants with typeclasses indicating directionality. For Scalaz or Cats, providing both of these would be a sufficient substitute for the widen[F[+_]] introduced above.

  def widen[F[_]: Functor](ft: F[T]): F[String]
  def cwiden[F[_]: Contravariant](fs: F[String]): F[T]

T = String translucency

subst and widen are very powerful, but you’re bothered by the fact that T erases to Object, and you would rather “untagging” happen automatically.

Thus far, you’ve been selectively revealing aspects of the type relationship between T and String. What if you were to globally reveal part of it?

To be clear, we must not globally reveal T = String; then there would be no usable distinction. But you can reveal weaker properties.

// in LabelImpl
  type T <: String

Now, widening happens automatically.

scala> taggedList: List[String]
res0: List[String] = List(hello, world)

scala> report: (List[Label] => Unit)
res1: List[Label] => Unit = $$Lambda$3348/1710049434@4320749b

Narrowing is still forbidden; T and String are still separate.

scala> (taggedList: List[String]): List[Label]
<console>:23: error: type mismatch;
 found   : List[String]
 required: List[hcavsc.translucent.Labels.Label]
    (which expands to)  List[hcavsc.translucent.Labels.Label.T]
       (taggedList: List[String]): List[Label]
                  ^

Moreover, erasure looks like AnyVal subclassing erasure again.

// javap -c -cp target/scala-2.12/classes hcavsc.translucent.MyFirstTests

  public java.lang.String combineLabels(java.lang.String, java.lang.String);

However, this makes it very difficult for typeclass resolution to reliably distinguish String and T. It’s also easy to accidentally untag. That’s why we took this out of Scalaz’s Tags; discriminating typeclass instances is a very useful feature of tags. If these aren’t concerns for you, globally revealed tag subtyping may be the most convenient for you.

Boxing Ints

AnyVal might seem to have better, more justifiable boxing behavior in the cast of primitive types like Int. When putting than AnyVal wrapper around Int, the custom box replaces the plain Integer box, rather than adding another layer.

final class MagicInt(val x: Int) extends AnyVal

val x = 42
val y = 84

// javap -c -cp target/scala-2.12/classes hcavsc.intsav.BytecodeTests

List(x, y)
      // skipping some setup bytecode
      13: newarray       int
      15: dup
      16: iconst_0
      17: iload_1
      18: iastore
      19: dup
      20: iconst_1
      21: iload_2
      22: iastore
      23: invokevirtual #25                 // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
      26: invokevirtual #29                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

List(new MagicInt(x), new MagicInt(y))
      // skipping more setup
      37: anewarray     #31                 // class hcavsc/intsav/MagicInt
      40: dup
      41: iconst_0
      42: new           #31                 // class hcavsc/intsav/MagicInt
      45: dup
      46: iload_1
      47: invokespecial #35                 // Method hcavsc/intsav/MagicInt."<init>":(I)V
      50: aastore
      51: dup
      52: iconst_1
      53: new           #31                 // class hcavsc/intsav/MagicInt
      56: dup
      57: iload_2
      58: invokespecial #35                 // Method hcavsc/intsav/MagicInt."<init>":(I)V
      61: aastore
      62: invokevirtual #39                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      65: invokevirtual #29                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

By contrast, the opaque T to Integer when we apply(i: Int): T. It then remains in that box until we deliberately get the Int back.

// MagicInt is defined like Label,
// but over Int instead of String
val x = MagicInt(42)
// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests
       0: getstatic     #21                 // Field hcavsc/ints/MagicInts$.MODULE$:Lhcavsc/ints/MagicInts$;
       3: invokevirtual #25                 // Method hcavsc/ints/MagicInts$.MagicInt:()Lhcavsc/ints/MagicInts$MagicIntImpl;
       6: bipush        42
       8: invokevirtual #29                 // Method hcavsc/ints/MagicInts$MagicIntImpl.apply:(I)Ljava/lang/Object;

// javap -c -cp target/scala-2.12/classes 'hcavsc.ints.MagicInts$$anon$1'
  public java.lang.Object apply(int);
    Code:
       0: aload_0
       1: iload_1
       2: invokevirtual #23                 // Method apply:(I)I
       5: invokestatic  #29                 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
       8: areturn

List(x, x)
      // skipping setup as before
      19: anewarray     #4                  // class java/lang/Object
      22: dup
      23: iconst_0
      24: aload_1
      25: aastore
      26: dup
      27: iconst_1
      28: aload_1
      29: aastore
      30: invokevirtual #43                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      33: invokevirtual #46                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

While the boxing in the above example happened in MagicInt.apply, there’s nothing special about that function’s boxing; the standard Int boxing serves just as well.

// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests

val xs = List(42)
      44: newarray       int
      46: dup
      47: iconst_0
      48: bipush        42
      50: iastore
      51: invokevirtual #50                 // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
      54: invokevirtual #46                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;
      57: astore_2

val mxs = MagicInt.subst(xs)
      58: getstatic     #21                 // Field hcavsc/ints/MagicInts$.MODULE$:Lhcavsc/ints/MagicInts$;
      61: invokevirtual #25                 // Method hcavsc/ints/MagicInts$.MagicInt:()Lhcavsc/ints/MagicInts$MagicIntImpl;
      64: aload_2
      65: invokevirtual #54                 // Method hcavsc/ints/MagicInts$MagicIntImpl.subst:(Ljava/lang/Object;)Ljava/lang/Object;

val y: MagicInt = mxs.head
      73: invokevirtual #60                 // Method scala/collection/immutable/List.head:()Ljava/lang/Object;
      76: astore        4

This is nice for two reasons:

  1. subst still doesn’t imply any additional boxing beyond what the underlying primitive type implies.
  2. Where the primitive boxing is optimized, you get to keep those optimizations; AnyVal subclass boxing effectively turns off these optimizations. For example, Integer boxing is optimized, but MagicInt’s AnyVal class is not.

The one remaining problem with the tag version of MagicInt is that its erasure is still Object.

def myId(x: MagicInt): MagicInt
// javap -c -cp target/scala-2.12/classes hcavsc.ints.OtherTests
  public abstract java.lang.Object myId(java.lang.Object);

However, if you use the “translucent” variant where it is always known that type T <: Int, the erasure is the same as Int itself.

// javap -c -cp target/scala-2.12/classes hcavsc.translucentints.OtherTests
  public abstract int myId(int);

(The boxing/unboxing of MagicInt changes to match.) Unfortunately, there’s no way to tell Scala what the erasure ought to be without exposing that extra type information, which may be quite inconvenient.

Would you box a JavaScript string?

Maybe if we weren’t working with types. Since we are working with types, we don’t have to box our strings in JavaScript in order to keep track of what sort of strings they are. But Scala might want to, anyway.

val x = new Label("hi")
js.Array(x, x)

// sbt fastOptJS output
  [new $c_Lhcavsc_av_Label().init___T("hi"),
   new $c_Lhcavsc_av_Label().init___T("hi")];

Surely it doesn’t have to for our tag-like Label. And indeed it doesn’t.

val h = Label("hi")
  // compiles to
  var h = "hi";
  // fastOptJS is smart enough to know
  // that apply can be elided

val hs = js.Array(h, h)
  // compiles to
  var hs = [h, h];

val strs = Label.subst[Lambda[x => js.Array[x] => js.Array[String]]](identity)(hs)
strs(0) + strs(1)
  // compiles to
  (("" + $as_T(hs[0])) + hs[1])
  // fastOptJS is smart enough to know
  // that subst, too, can be elided

The possible existence of subst tells us something about the deeper meaning of our abstract type definition, type T = String, that holds true no matter how much of this equality we hide behind existential layers. It is this: the compiler cannot predict when the fact that T = String will be visible, and when it will not be. It must therefore not generate code that would “go wrong” in contexts where this is revealed.

For example, at one point, we saw that

Label.subst(Monoid[String])

would yield indeed produce a suitable Monoid[Label]. This means not only is the value’s type reinterpreted, but also, by consequence, its members.

scala> val labelMonoid = Label.subst(Monoid[String])
labelMonoid: scalaz.Monoid[Label.T] = scalaz.std.StringInstances$stringInstance$@6f612117

scala> labelMonoid.zero
res0: hcavsc.subst.Labels.Label.T = ""

scala> labelMonoid.append _
res1: (Label.T, => Label.T) => Label.T = $$Lambda$3184/987934553@3af2619b

However, in subst, we have charged the compiler with doing this arbitrarily complex substitution with 100% accuracy and in constant time. There are no opportunities to generate “wrappers”, not for these structures that merely employ Label in their types. And, by consequence, there’s nowhere to put code that would use some means to treat Label and String differently based on runtime choices.

If you wish to automatically add “wrappers”, you have a difficult problem already with parametric polymorphism. With higher-kinded types, you have an intractable problem.

Speaking of higher-kinded types…

Type tagging works perfectly well with parameterized types.

type KWConcrete[W, A, B] = Kleisli[(W, ?), A, B]

sealed abstract class KWImpl {
  type T[W, A, B]

  def subst[F[_[_, _, _]]](fk: F[KWConcrete]): F[T]
}

val KW: KWImpl = new KWImpl {
  type T[W, A, B] = KWConcrete[W, A, B]

  override def subst[F[_[_, _, _]]](fk: F[KWConcrete]) = fk
}

type KW[W, A, B] = KW.T[W, A, B]

This is nice for a few reasons.

  1. You can still “add a type parameter” to do abstraction on your tagged types.
  2. You can hide much of the complexity of a monad transformer stack, allowing it to infer more easily with Unapply or -Ypartial-unification. This is because, unlike standalone type aliases, scalac can’t dealias your abstraction away. (Warning: this doesn’t apply if you make the type T “translucent”; hide your types to keep them safe from scalac’s prying expander.)
  3. You can use subst to “GND” your Monad and other typeclass instances.
implicit def monadKW[W: Monoid, A]: Monad[KW[W, A, ?]] = {
  type MF[KWC[_, _, _]] = Monad[KWC[W, A, ?]]
  // KW.subst[MF](implicitly) with better inference
  KW.subst[MF](Kleisli.kleisliMonadReader[(W, ?), A])
}

“Tagless final effects à la Ermine Writers” develops this kind of type abstraction in another direction.

For the derivation of subst’s weird signature above, see “Higher Leibniz”.

Why is the : LabelImpl ascription so important?

Suppose that you ignored my comments and defined the concrete LabelImpl without an ascription.

val Label = new LabelImpl {
  // ...implementation continues as before

Then, the abstraction would disappear; you would no longer have a “new type”.

scala> val lbl: Label = "hi"
lbl: Label = hi

scala> lbl: String
res0: String = hi

scala> implicitly[Label =:= String]
res1: =:=[Label,String] = <function1>

Why did it break so hard? Well, the inferred type of val Label is different from the one you were ascribing.

scala> Label
res2: LabelImpl{type T = String} = hcavsc.broken.Labels$$anon$1@48cd7b32

That means that Label.T is no longer existential; it’s known, and known to be String. Accordingly, type Label also expands to String, and vice versa.

If you want it a new type, you must keep it existential.

Some background

The unboxed tagging technique is based on cast-free type tags in the upcoming Scalaz 7.3.0. That, in turn, was based on use of existential types in Ermine's implementation to hide expansions from scalac.

This is also a specialization of the type-member based MTL encoding I used in "Tagless final effects à la Ermine Writers". The essential difference is that individual program elements were universally quantified over the expansion of the abstract type, where here, the entire program is universally quantified over that expansion, because the existential quantifier is globally bound.

I’m certainly not the first person to explore this technique; for example, Julian Michael wrote about it several months before this article.

And, of course, if you are an ML (OCaml, SML, &c) fan, you’re probably thinking “yeah, so what? I do this all the time.” Sorry. We can be a little slow on the uptake in Scala world, where we greatly undervalue the ideas of the functional languages before us.

This article was tested with Scala 2.12.1, Scalaz 7.2.10, Scala.js 0.6.13, and Kind Projector 0.9.3. The code is available in compilable form for your own experiments via Bazaar.

The High Cost of AnyVal subclasses...

The claim of a multi-paradigm language is to harmoniously serve various approaches to programming. The AnyVal subclass feature forms a strong counterargument to Scala’s multiparadigm claim.

AnyVal subclasses penalize parametric-polymorphic, type-safe programming, in order to better support type-unsafe programming styles, such as those making use of isInstanceOf. They sneakily shift the blame for their performance problems onto type safety and polymorphism. I will provide an existence proof that the blame ought to land squarely on AnyVal subclasses, but I cannot stop this blame-shifting from lending further credence to the witticism “If scala was the only language I had to think in, I’d think functional programming was a bad idea that didn’t scale, too.”

Moreover, by creating the false impression that the “newtype problem” has been solved in Scala, AnyVal subclasses obscure solutions that better serve polymorphic, type-safe programming. While I describe such a solution in this article, I have no illusions that I alone can reverse the upward trend of the AnyVal meme.

Scala, today, has the potential to better support type-safe programming, and it has since before the advent of AnyVal subclasses. In this article, we will focus on how the language could reveal this potential, becoming a better foundation for polymorphic, type-safe programming than it advertises today.

A String reference must be boxed

Suppose that you want a “wrapper” around Strings with a unique type so that they can’t be accidentally confused with arbitrary Strings. This is a common use case for a newtype, a wrapper with intentionally incompatible type that exists only at compile time. (The name “newtype” comes from the Haskell keyword for its version of this feature.)

You decide to use extends AnyVal, since you have heard that this is a compile-time-only class that doesn’t get allocated on the heap.

class Label(val str: String) extends AnyVal

object Label {
  def apply(s: String): Label =
    new Label(s)
}

This seems to do the trick with your first several tests.

class MyFirstTests {
  def combineLabels(l: Label, r: Label): Label =
    Label(l.str + r.str)

  def printLabels(): Unit = {
    val fst = Label("hello")
    val snd = Label("world")
    println(fst.str)
    println(snd.str)
  }
}

As reported by javap, the new Label goes away for Label.apply.

// javap -c -cp target/scala-2.12/classes hcavsc.av.Label$

  public java.lang.String apply(java.lang.String);
    Code:
       0: aload_1
       1: areturn

It vanishes for the signature of combineLabels too, meaning that we can write some functions over Labels without allocating them.

// javap -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

  public java.lang.String combineLabels(java.lang.String, java.lang.String);

You can even use Label in a case class, and it will be String at runtime.

case class Labelled[A](lbl: Label, a: A)

// javap -p -cp target/scala-2.12/classes hcavsc.av.Labelled

  private final java.lang.String lbl;
  private final A a;

But then, you decide that you want a List of Labels.

// add to printLabels
val lbls = List(fst, snd)

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

      24: iconst_2
      25: anewarray     #56                 // class hcavsc/av/Label
      28: dup
      29: iconst_0
      30: new           #56                 // class hcavsc/av/Label
      33: dup
      34: aload_1
      35: invokespecial #59                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      38: aastore
      39: dup
      40: iconst_1
      41: new           #56                 // class hcavsc/av/Label
      44: dup
      45: aload_2
      46: invokespecial #59                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      49: aastore
      50: invokevirtual #63                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      53: invokevirtual #66                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

Huh. Didn’t expect those two news to be there. Ah well, maybe now that they’re in the list,

lbls.map{x => Label(x.str + "Aux")}

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

  public static final java.lang.Object $anonfun$printLabels$1$adapted(java.lang.Object);
    Code:
       0: new           #61                 // class hcavsc/av/Label
       3: dup
       4: aload_0
       5: checkcast     #61                 // class hcavsc/av/Label
       8: invokevirtual #117                // Method hcavsc/av/Label.str:()Ljava/lang/String;
      11: invokestatic  #119                // Method $anonfun$printLabels$1:(Ljava/lang/String;)Ljava/lang/String;
      14: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      17: areturn

OK, sure, so you took it out and put it back, so it unboxed and then boxed again. How about a tuple, instead?

// add to printLabels
(fst, snd)

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

      73: new           #103                // class scala/Tuple2
      76: dup
      77: new           #61                 // class hcavsc/av/Label
      80: dup
      81: aload_1
      82: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      85: new           #61                 // class hcavsc/av/Label
      88: dup
      89: aload_2
      90: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      93: invokespecial #106                // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)Vf

Two more news. Fine. How about the identity method?

// add to printLabels
identity(fst)

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

      97: getstatic     #59                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
     100: new           #61                 // class hcavsc/av/Label
     103: dup
     104: aload_1
     105: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
     108: invokevirtual #109                // Method scala/Predef$.identity:(Ljava/lang/Object;)Ljava/lang/Object;

So there seems to be an impressive collection of things that will cause an AnyVal subclass to box. You assume there’s a good reason they implemented it this way; we’ll get into that later.

No boxing with type tags

However, you decide to look for an alternative newtype mechanism that doesn’t box, under the theory that scalac’s reasons for boxing AnyVal subclasses don’t apply to the use cases you have in mind for Label and similar things in your codebase.

You have heard that Scalaz’s “type tags” are a kind of newtype with no boxing. You could just pull in scalaz-core and see if you can get them to work, but decide to implement Label directly using the same technique as Scalaz tags, instead.

object Labels {
  sealed abstract class LabelImpl {
    type T
    def apply(s: String): T
    def unwrap(lbl: T): String
  }

  // do not forget `: LabelImpl`; it is key
  val Label: LabelImpl = new LabelImpl {
    type T = String
    override def apply(s: String) = s
    override def unwrap(lbl: T) = lbl
  }

  type Label = Label.T
}

import Labels._

While regretting that the compiler no longer makes your Label type very convenient to define, you press on. First, to confirm, you can’t treat an arbitrary String as a Label:

scala> "hi there": Label
<console>:15: error: type mismatch;
 found   : String("hi there")
 required: hcavsc.subst.Labels.Label
    (which expands to)  hcavsc.subst.Labels.Label.T
       "hi there": Label
       ^

So far, so good. Then, why not retry some of the earlier experiments that caused the AnyVal-based label to box?

// javap -c -cp target/scala-2.12/classes hcavsc.subst.MyFirstTests

val fst = Label("hello")
val snd = Label("world")
identity(fst)
      24: getstatic     #43                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
      27: aload_1
      28: invokevirtual #47                 // Method scala/Predef$.identity:(Ljava/lang/Object;)Ljava/lang/Object;

(fst, snd)
      32: new           #49                 // class scala/Tuple2
      35: dup
      36: aload_1
      37: aload_2
      38: invokespecial #53                 // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)V

val lbls = List(fst, snd)
      48: iconst_2
      49: anewarray     #4                  // class java/lang/Object
      52: dup
      53: iconst_0
      54: aload_1
      55: aastore
      56: dup
      57: iconst_1
      58: aload_2
      59: aastore
      60: invokevirtual #62                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      63: invokevirtual #65                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

lbls.map{x => Label(Label.unwrap(x) + "Aux")}
  public static final java.lang.Object $anonfun$printLabels$1(java.lang.Object);
    Code:
       0: getstatic     #26                 // Field hcavsc/subst/Labels$.MODULE$:Lhcavsc/subst/Labels$;
       3: invokevirtual #30                 // Method hcavsc/subst/Labels$.Label:()Lhcavsc/subst/Labels$LabelImpl;
       6: new           #104                // class java/lang/StringBuilder
       9: dup
      10: invokespecial #106                // Method java/lang/StringBuilder."<init>":()V
      13: getstatic     #26                 // Field hcavsc/subst/Labels$.MODULE$:Lhcavsc/subst/Labels$;
      16: invokevirtual #30                 // Method hcavsc/subst/Labels$.Label:()Lhcavsc/subst/Labels$LabelImpl;
      19: aload_0
      20: invokevirtual #110                // Method hcavsc/subst/Labels$LabelImpl.unwrap:(Ljava/lang/Object;)Ljava/lang/String;
      23: invokevirtual #114                // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      26: ldc           #116                // String Aux
      28: invokevirtual #114                // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      31: invokevirtual #120                // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      34: invokevirtual #36                 // Method hcavsc/subst/Labels$LabelImpl.apply:(Ljava/lang/String;)Ljava/lang/Object;
      37: areturn

No allocation there. Hmm. Well, maybe our concrete LabelImpl instance is doing some secret boxing?

// javap -c -cp target/scala-2.12/classes 'hcavsc.subst.Labels$$anon$1'

  public java.lang.String apply(java.lang.String);
    Code:
       0: aload_1
       1: areturn

  public java.lang.Object apply(java.lang.String);
    Code:
       0: aload_0
       1: aload_1
       2: invokevirtual #27                 // Method apply:(Ljava/lang/String;)Ljava/lang/String;
       5: areturn

  public java.lang.String unwrap(java.lang.String);
    Code:
       0: aload_1
       1: areturn

  public java.lang.String unwrap(java.lang.Object);
    Code:
       0: aload_0
       1: aload_1
       2: checkcast     #21                 // class java/lang/String
       5: invokevirtual #23                 // Method unwrap:(Ljava/lang/String;)Ljava/lang/String;
       8: areturn

No boxing there. That makes sense; in that context, Label is String; the fact that our Label-using code doesn’t know that is irrelevant, because we hid that information using existential types.

So, it is possible to have a newtype mechanism that doesn’t box. You don’t have to wait for the JVM to deliver its own brand of value types; you can even implement it yourself, in Scala, today. They must have had another reason for all this boxing, because “we have to because JVM” is denied by the behavior of Scala-JVM itself.

You aren’t sure what those reasons are, but you decide to port the rest of your code to use the existential Label. Befitting an unboxed newtype, the runtime representation of List[Label] is exactly the same as the underlying List[String], as well as every Option, Either, and whatever else you can think up.

You notice that the erasure for Label is different, but this seems significantly less serious than the boxing problem, so leave it for now. (We will dig into related design decisions later.)

What can you do with a box? What can you do without a box?

Let’s start with a quick comparison of boxing AnyVal and the “type tagging” mechanism we’ve just seen.

Capability AnyVal subclass Type tag
Defining methods normal override; virtual method dispatch available implicit class enrichment only
lbl.getClass Label String
Cast Any to Label checked at runtime unchecked; no wrapper left at runtime
isInstanceOf checked at runtime unchecked; same recent casting doesn’t work
Adding type parameters to methods boxing/unbox penalty no boxing penalty
Wrapping a List O(n): box every element and reallocate list itself O(1), with subst: no allocation, output list eq to input list
Unwrapping a list O(n): reallocate list, unbox each element O(1): eq output with subst. Also possible to make unwrapping a <: (free liftable automatic upcast)
Coinductive type class instances works; boxing penalty applies works; no boxing penalty
Wrapping whole program parts each function must be wrapped to add per-value wrapping/unwrapping O(1): just works with subst

I detect from this matrix a particular theme: AnyVal subclasses give up a lot of capability in the type-safe arena. Consider rewriting a loop that uses Label as state as a foldLeft: you must contend with a new boxing/unboxing penalty, since the state parameter in a foldLeft is type-parametric. It’s more fodder for the persistent higher-order function skeptics among us.

While we know that adding type parameters to our functions improves type-safety, the skeptic will note the boxing penalty, and attribute it to parametric polymorphism. But we know the true culprit.

If AnyVal subclassing taxes type-safe programming in these ways, what is it spending the money on? Simple: support for isInstanceOf, “safe” casting, implementing interfaces, overriding AnyRef methods like toString, and the like.

As type-safe, parametrically-polymorphic programmers, we avoid these features, as a matter of principle and of practice. Some, like checked casting, are simply not type-safe. Some ruin free theorems, like toString, and we would prefer safe mechanisms, like the Show typeclass, to actually tell us at compile time if our programs make sense. Yet, if we use AnyVal subclasses, we have to pay the price for all the programmers that wish to write type-unsafe code, like List[Any] => List[Label]. All is not well in Multiparadigmatic Land.

When will our methods be resolved?

To showcase the relationship of the two approaches to runtime-reflective programming versus statically-proven programming, let’s consider stringification.

Scala provides the toString virtual method on Any. Calling this method is dynamically resolved on the value itself; it is as if every value must carry around a pointer to a function that, given itself, returns a String. We can define this for our original AnyVal-based Label, and so toString on List et al will also work.

// add to class Label
  override def toString = s"Label($str)"

scala> List(fst, snd).toString
res0: String = List(Label(hello), Label(world))

scala> Some(fst).toString
res1: String = Some(Label(hello))

Moreover, this “works” even for the type List[Any].

scala> List[Any](fst, "hi").toString
res2: String = List(Label(hello), hi)

You cannot override toString for our fully-erased Label. After all, every Label is just a String at runtime! (Different types, same class.)

However, the type-safe programmer will recognize List[Any] as a type that, if it occurs in her program, means “something has gone wrong with this program”. Moreover, because toString doesn’t make sense for all types, we use a static mechanism, like the scalaz.Show typeclass. And this works fine for Label, because it is statically resolved by type, not dependent on an implicit runtime member of every Label; in fact, it can only work because it is static!

// add to object Labels
  import scalaz.Show

  implicit val showLabel: Show[Label] =
    Show shows {lbl =>
      s"Label(${Label.unwrap(lbl)})"}

scala> import scalaz.syntax.show._, scalaz.std.list._,
              scalaz.std.option._

scala> List(fst, snd).shows
res1: String = [Label(hello),Label(world)]

scala> some(fst).shows
res2: String = Some(Label(hello))

So if you are doing this kind of programming, it doesn’t matter whether you can’t override toString, or type test, &c; you weren’t doing it anyway. But, aside from a little performance bump, what do you gain from unboxed type-tagging?

When is a Label a String? When is it not?

You notice that subst is at the foundation of several Scalaz constructs like Leibniz and Liskov, and plays a prominent role in the Tag API as well. You decide to add this to your LabelImpl as well.

// in LabelImpl
def subst[F[_]](fs: F[String]): F[T]

// and in val Label
override def subst[F[_]](fs: F[String]) = fs

It’s interesting that you can use this to tag a whole List[String] in constant time:

scala> val taggedList = Label.subst(List("hello", "world"))
taggedList: List[Label.T] = List(hello, world)

It’s also interesting that you can use this to untag a whole list in constant time.

scala> Label.subst[Lambda[x => List[x] => List[String]]](identity)(taggedList)
res0: List[String] = List(hello, world)

Functions and typeclass instance can be tagged or untagged, too.

scala> Label.subst[Lambda[x => (x, Int) => x]](_ substring _)
res1: (Label.T, Int) => Label.T = $$Lambda$3194/964109489@72557d64

scala> import scalaz.Monoid, scalaz.std.string._

scala> Label.subst(Monoid[String])
res3: scalaz.Monoid[Label.T] = scalaz.std.StringInstances$stringInstance$@252798fe

All of this works because subst is really evidence that, deep down, String and Label are the same.

scala> import scalaz.Leibniz, Leibniz.{===, refl}

scala> Label.subst[String === ?](refl)
res4: Leibniz[Nothing,Any,String,Label.T] = scalaz.Leibniz$$anon$2@702af12c

Yet, you ran an experiment earlier to prove that you can’t confuse String and Label; indeed, this still holds true, despite the presence of subst!

scala> "still a string": Label
<console>:21: error: type mismatch;
 found   : String("still a string")
 required: hcavsc.subst.Labels.Label
    (which expands to)  hcavsc.subst.Labels.Label.T
       "still a string": Label
       ^

scala> Label("still a label"): String
<console>:21: error: type mismatch;
 found   : hcavsc.subst.Labels.Label.T
 required: String
       Label("still a label"): String
            ^

Here’s what’s happening: in a sense, (new Label(_)): (String => Label) and (_.str): (Label => String) witness that there’s a conversion between the two types. subst witnesses that there’s identical runtime representation between its own two types. You get to selectively reveal this evidence when it makes writing your program more convenient; the rest of the time, it is hidden.

But I would like to step one level up: this is a design space, and subst as we have seen it isn’t appropriate for all designs. As the author of your own abstract newtypes, you get to choose how much, if any, of this underlying type equality to reveal.

If subst is the right choice

For various reasons, the above is how Scalaz Tag (@@) is defined. If you wish these semantics, you might as well throw everything else away and write

sealed trait LabelTag // no instances
type Label = String @@ LabelTag
val Label = Tag.of[LabelTag]

and take advantage of the convenient tools around subst defined in Tag.Of. But it’s not the only choice! It’s one point in the design space. To do right by your API users, it’s worth exploring that design space a little more.

Type-unsafe code isn’t type-safe

Unboxed existential tagging spreads through your codebase. You feel free to apply it liberally, because you know you aren’t paying the wrapping costs of AnyVal subclasses; all these new abstraction layers are pure type-level, and fully erased.

You receive a “bug report” from a fellow developer that this expression never seems to filter out the non-label Strings.

(xs: List[Any]).collect{case t: Label => t}
<console>:16: warning: abstract type pattern
 hcavsc.translucent.Labels.Label.T
 (the underlying of hcavsc.translucent.Labels.Label)
 is unchecked since it is eliminated by erasure
       (xs: List[Any]).collect{case t: Label => t}
                                       ^
<console>:16: warning: The outer reference
 in this type test cannot be checked at run time.
       (xs: List[Any]).collect{case t: Label => t}
                                    ^

Your mind on safe pattern matching practice, you add def unapply(s: String): Option[T] to LabelImpl and counsel preference for the form case Label(t) => ..., as well as to not ignore -unchecked warnings.

You get another bug report that this always seems to succeed.

(s: String).asInstanceOf[Label]

Repeating your advice about warnings, you start to wonder, “where is this kind of code coming from?”

Someone else complains that they want to make T extends Ordered[T], and can’t fathom where the code should go. You advise the static approach of implementing the Ordering typeclass instance instead for T, wonder how deep the object-orientation hole goes, and forward the link about the typeclass pattern again, too.

Suppose you went back to AnyVal

We’ve seen that AnyVal subclasses could have been incredibly cheap, but aren’t, so as to support “features” like checked casting. Who’s going to foot the bill?

  1. Oh, this allocates when passing through polymorphic contexts, but not monomorphic ones? Avoid polymorphic code.
  2. Oh, this extra type-safety adds all this allocation? Type safety is expensive at runtime; we need to stick to String.
  3. We can’t do any better; the JVM limits the possibilities. You have to pay for runtime wrapping if you want a wrapper type.

In this article, I have demonstrated that none of these conclusions are correct. However, only a tiny minority of Scala practitioners will ever read this article, and I will not blame the rest for drawing these seemingly straightforward inferences, ultimately faulty as they are.

The real cost of AnyVal subclasses is not all the needless memory allocation. The real cost is the damage to the practice of type-safe programming in Scala. It’s in all the curious developers who sought to add a little more type safety to their programs, only to find themselves penalized by the runtime, once bitten. It’s in the reinforcement of this attitude towards abstraction that they’ll continue to carry with them, the next time an opportunity presents itself. It’s a missed opportunity for pure type-level thinking, all so that asInstanceOf “works”.

See “…and the glorious subst to come” for further development of the ideas in this article.

This article was tested with Scala 2.12.1, Scalaz 7.2.10, and Kind Projector 0.9.3. The code is available in compilable form for your own experiments via Bazaar.

About Me

My photo

I am S11001001, s11 for short.  Programmer and Free Software enthusiast.

Search for my name to see more stuff about me; no one shares my real name, and no one shares my username, though I can't understand why.