Sunday, April 9, 2017

The High Cost of AnyVal subclasses...

The claim of a multi-paradigm language is to harmoniously serve various approaches to programming. The AnyVal subclass feature forms a strong counterargument to Scala’s multiparadigm claim.

AnyVal subclasses penalize parametric-polymorphic, type-safe programming, in order to better support type-unsafe programming styles, such as those making use of isInstanceOf. They sneakily shift the blame for their performance problems onto type safety and polymorphism. I will provide an existence proof that the blame ought to land squarely on AnyVal subclasses, but I cannot stop this blame-shifting from lending further credence to the witticism “If scala was the only language I had to think in, I’d think functional programming was a bad idea that didn’t scale, too.”

Moreover, by creating the false impression that the “newtype problem” has been solved in Scala, AnyVal subclasses obscure solutions that better serve polymorphic, type-safe programming. While I describe such a solution in this article, I have no illusions that I alone can reverse the upward trend of the AnyVal meme.

Scala, today, has the potential to better support type-safe programming, and it has since before the advent of AnyVal subclasses. In this article, we will focus on how the language could reveal this potential, becoming a better foundation for polymorphic, type-safe programming than it advertises today.

A String reference must be boxed

Suppose that you want a “wrapper” around Strings with a unique type so that they can’t be accidentally confused with arbitrary Strings. This is a common use case for a newtype, a wrapper with intentionally incompatible type that exists only at compile time. (The name “newtype” comes from the Haskell keyword for its version of this feature.)

You decide to use extends AnyVal, since you have heard that this is a compile-time-only class that doesn’t get allocated on the heap.

class Label(val str: String) extends AnyVal

object Label {
  def apply(s: String): Label =
    new Label(s)
}

This seems to do the trick with your first several tests.

class MyFirstTests {
  def combineLabels(l: Label, r: Label): Label =
    Label(l.str + r.str)

  def printLabels(): Unit = {
    val fst = Label("hello")
    val snd = Label("world")
    println(fst.str)
    println(snd.str)
  }
}

As reported by javap, the new Label goes away for Label.apply.

// javap -c -cp target/scala-2.12/classes hcavsc.av.Label$

  public java.lang.String apply(java.lang.String);
    Code:
       0: aload_1
       1: areturn

It vanishes for the signature of combineLabels too, meaning that we can write some functions over Labels without allocating them.

// javap -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

  public java.lang.String combineLabels(java.lang.String, java.lang.String);

You can even use Label in a case class, and it will be String at runtime.

case class Labelled[A](lbl: Label, a: A)

// javap -p -cp target/scala-2.12/classes hcavsc.av.Labelled

  private final java.lang.String lbl;
  private final A a;

But then, you decide that you want a List of Labels.

// add to printLabels
val lbls = List(fst, snd)

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

      24: iconst_2
      25: anewarray     #56                 // class hcavsc/av/Label
      28: dup
      29: iconst_0
      30: new           #56                 // class hcavsc/av/Label
      33: dup
      34: aload_1
      35: invokespecial #59                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      38: aastore
      39: dup
      40: iconst_1
      41: new           #56                 // class hcavsc/av/Label
      44: dup
      45: aload_2
      46: invokespecial #59                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      49: aastore
      50: invokevirtual #63                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      53: invokevirtual #66                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

Huh. Didn’t expect those two news to be there. Ah well, maybe now that they’re in the list,

lbls.map{x => Label(x.str + "Aux")}

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

  public static final java.lang.Object $anonfun$printLabels$1$adapted(java.lang.Object);
    Code:
       0: new           #61                 // class hcavsc/av/Label
       3: dup
       4: aload_0
       5: checkcast     #61                 // class hcavsc/av/Label
       8: invokevirtual #117                // Method hcavsc/av/Label.str:()Ljava/lang/String;
      11: invokestatic  #119                // Method $anonfun$printLabels$1:(Ljava/lang/String;)Ljava/lang/String;
      14: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      17: areturn

OK, sure, so you took it out and put it back, so it unboxed and then boxed again. How about a tuple, instead?

// add to printLabels
(fst, snd)

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

      73: new           #103                // class scala/Tuple2
      76: dup
      77: new           #61                 // class hcavsc/av/Label
      80: dup
      81: aload_1
      82: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      85: new           #61                 // class hcavsc/av/Label
      88: dup
      89: aload_2
      90: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
      93: invokespecial #106                // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)Vf

Two more news. Fine. How about the identity method?

// add to printLabels
identity(fst)

// javap -c -cp target/scala-2.12/classes hcavsc.av.MyFirstTests

      97: getstatic     #59                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
     100: new           #61                 // class hcavsc/av/Label
     103: dup
     104: aload_1
     105: invokespecial #64                 // Method hcavsc/av/Label."<init>":(Ljava/lang/String;)V
     108: invokevirtual #109                // Method scala/Predef$.identity:(Ljava/lang/Object;)Ljava/lang/Object;

So there seems to be an impressive collection of things that will cause an AnyVal subclass to box. You assume there’s a good reason they implemented it this way; we’ll get into that later.

No boxing with type tags

However, you decide to look for an alternative newtype mechanism that doesn’t box, under the theory that scalac’s reasons for boxing AnyVal subclasses don’t apply to the use cases you have in mind for Label and similar things in your codebase.

You have heard that Scalaz’s “type tags” are a kind of newtype with no boxing. You could just pull in scalaz-core and see if you can get them to work, but decide to implement Label directly using the same technique as Scalaz tags, instead.

object Labels {
  sealed abstract class LabelImpl {
    type T
    def apply(s: String): T
    def unwrap(lbl: T): String
  }

  // do not forget `: LabelImpl`; it is key
  val Label: LabelImpl = new LabelImpl {
    type T = String
    override def apply(s: String) = s
    override def unwrap(lbl: T) = lbl
  }

  type Label = Label.T
}

import Labels._

While regretting that the compiler no longer makes your Label type very convenient to define, you press on. First, to confirm, you can’t treat an arbitrary String as a Label:

scala> "hi there": Label
<console>:15: error: type mismatch;
 found   : String("hi there")
 required: hcavsc.subst.Labels.Label
    (which expands to)  hcavsc.subst.Labels.Label.T
       "hi there": Label
       ^

So far, so good. Then, why not retry some of the earlier experiments that caused the AnyVal-based label to box?

// javap -c -cp target/scala-2.12/classes hcavsc.subst.MyFirstTests

val fst = Label("hello")
val snd = Label("world")
identity(fst)
      24: getstatic     #43                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
      27: aload_1
      28: invokevirtual #47                 // Method scala/Predef$.identity:(Ljava/lang/Object;)Ljava/lang/Object;

(fst, snd)
      32: new           #49                 // class scala/Tuple2
      35: dup
      36: aload_1
      37: aload_2
      38: invokespecial #53                 // Method scala/Tuple2."<init>":(Ljava/lang/Object;Ljava/lang/Object;)V

val lbls = List(fst, snd)
      48: iconst_2
      49: anewarray     #4                  // class java/lang/Object
      52: dup
      53: iconst_0
      54: aload_1
      55: aastore
      56: dup
      57: iconst_1
      58: aload_2
      59: aastore
      60: invokevirtual #62                 // Method scala/Predef$.genericWrapArray:(Ljava/lang/Object;)Lscala/collection/mutable/WrappedArray;
      63: invokevirtual #65                 // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala/collection/immutable/List;

lbls.map{x => Label(Label.unwrap(x) + "Aux")}
  public static final java.lang.Object $anonfun$printLabels$1(java.lang.Object);
    Code:
       0: getstatic     #26                 // Field hcavsc/subst/Labels$.MODULE$:Lhcavsc/subst/Labels$;
       3: invokevirtual #30                 // Method hcavsc/subst/Labels$.Label:()Lhcavsc/subst/Labels$LabelImpl;
       6: new           #104                // class java/lang/StringBuilder
       9: dup
      10: invokespecial #106                // Method java/lang/StringBuilder."<init>":()V
      13: getstatic     #26                 // Field hcavsc/subst/Labels$.MODULE$:Lhcavsc/subst/Labels$;
      16: invokevirtual #30                 // Method hcavsc/subst/Labels$.Label:()Lhcavsc/subst/Labels$LabelImpl;
      19: aload_0
      20: invokevirtual #110                // Method hcavsc/subst/Labels$LabelImpl.unwrap:(Ljava/lang/Object;)Ljava/lang/String;
      23: invokevirtual #114                // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      26: ldc           #116                // String Aux
      28: invokevirtual #114                // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      31: invokevirtual #120                // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      34: invokevirtual #36                 // Method hcavsc/subst/Labels$LabelImpl.apply:(Ljava/lang/String;)Ljava/lang/Object;
      37: areturn

No allocation there. Hmm. Well, maybe our concrete LabelImpl instance is doing some secret boxing?

// javap -c -cp target/scala-2.12/classes 'hcavsc.subst.Labels$$anon$1'

  public java.lang.String apply(java.lang.String);
    Code:
       0: aload_1
       1: areturn

  public java.lang.Object apply(java.lang.String);
    Code:
       0: aload_0
       1: aload_1
       2: invokevirtual #27                 // Method apply:(Ljava/lang/String;)Ljava/lang/String;
       5: areturn

  public java.lang.String unwrap(java.lang.String);
    Code:
       0: aload_1
       1: areturn

  public java.lang.String unwrap(java.lang.Object);
    Code:
       0: aload_0
       1: aload_1
       2: checkcast     #21                 // class java/lang/String
       5: invokevirtual #23                 // Method unwrap:(Ljava/lang/String;)Ljava/lang/String;
       8: areturn

No boxing there. That makes sense; in that context, Label is String; the fact that our Label-using code doesn’t know that is irrelevant, because we hid that information using existential types.

So, it is possible to have a newtype mechanism that doesn’t box. You don’t have to wait for the JVM to deliver its own brand of value types; you can even implement it yourself, in Scala, today. They must have had another reason for all this boxing, because “we have to because JVM” is denied by the behavior of Scala-JVM itself.

You aren’t sure what those reasons are, but you decide to port the rest of your code to use the existential Label. Befitting an unboxed newtype, the runtime representation of List[Label] is exactly the same as the underlying List[String], as well as every Option, Either, and whatever else you can think up.

You notice that the erasure for Label is different, but this seems significantly less serious than the boxing problem, so leave it for now. (We will dig into related design decisions later.)

What can you do with a box? What can you do without a box?

Let’s start with a quick comparison of boxing AnyVal and the “type tagging” mechanism we’ve just seen.

Capability AnyVal subclass Type tag
Defining methods normal override; virtual method dispatch available implicit class enrichment only
lbl.getClass Label String
Cast Any to Label checked at runtime unchecked; no wrapper left at runtime
isInstanceOf checked at runtime unchecked; same reason casting doesn’t work
Adding type parameters to methods boxing/unbox penalty no boxing penalty
Wrapping a List O(n): box every element and reallocate list itself O(1), with subst: no allocation, output list eq to input list
Unwrapping a list O(n): reallocate list, unbox each element O(1): eq output with subst. Also possible to make unwrapping a <: (free liftable automatic upcast)
Coinductive type class instances works; boxing penalty applies works; no boxing penalty
Wrapping whole program parts each function must be wrapped to add per-value wrapping/unwrapping O(1): just works with subst

I detect from this matrix a particular theme: AnyVal subclasses give up a lot of capability in the type-safe arena. Consider rewriting a loop that uses Label as state as a foldLeft: you must contend with a new boxing/unboxing penalty, since the state parameter in a foldLeft is type-parametric. It’s more fodder for the persistent higher-order function skeptics among us.

While we know that adding type parameters to our functions improves type-safety, the skeptic will note the boxing penalty, and attribute it to parametric polymorphism. But we know the true culprit.

If AnyVal subclassing taxes type-safe programming in these ways, what is it spending the money on? Simple: support for isInstanceOf, “safe” casting, implementing interfaces, overriding AnyRef methods like toString, and the like.

As type-safe, parametrically-polymorphic programmers, we avoid these features, as a matter of principle and of practice. Some, like checked casting, are simply not type-safe. Some ruin free theorems, like toString, and we would prefer safe mechanisms, like the Show typeclass, to actually tell us at compile time if our programs make sense. Yet, if we use AnyVal subclasses, we have to pay the price for all the programmers that wish to write type-unsafe code, like List[Any] => List[Label]. All is not well in Multiparadigmatic Land.

When will our methods be resolved?

To showcase the relationship of the two approaches to runtime-reflective programming versus statically-proven programming, let’s consider stringification.

Scala provides the toString virtual method on Any. Calling this method is dynamically resolved on the value itself; it is as if every value must carry around a pointer to a function that, given itself, returns a String. We can define this for our original AnyVal-based Label, and so toString on List et al will also work.

// add to class Label
  override def toString = s"Label($str)"

scala> List(fst, snd).toString
res0: String = List(Label(hello), Label(world))

scala> Some(fst).toString
res1: String = Some(Label(hello))

Moreover, this “works” even for the type List[Any].

scala> List[Any](fst, "hi").toString
res2: String = List(Label(hello), hi)

You cannot override toString for our fully-erased Label. After all, every Label is just a String at runtime! (Different types, same class.)

However, the type-safe programmer will recognize List[Any] as a type that, if it occurs in her program, means “something has gone wrong with this program”. Moreover, because toString doesn’t make sense for all types, we use a static mechanism, like the scalaz.Show typeclass. And this works fine for Label, because it is statically resolved by type, not dependent on an implicit runtime member of every Label; in fact, it can only work because it is static!

// add to object Labels
  import scalaz.Show

  implicit val showLabel: Show[Label] =
    Show shows {lbl =>
      s"Label(${Label.unwrap(lbl)})"}

scala> import scalaz.syntax.show._, scalaz.std.list._,
              scalaz.std.option._

scala> List(fst, snd).shows
res1: String = [Label(hello),Label(world)]

scala> some(fst).shows
res2: String = Some(Label(hello))

So if you are doing this kind of programming, it doesn’t matter whether you can’t override toString, or type test, &c; you weren’t doing it anyway. But, aside from a little performance bump, what do you gain from unboxed type-tagging?

When is a Label a String? When is it not?

You notice that subst is at the foundation of several Scalaz constructs like Leibniz and Liskov, and plays a prominent role in the Tag API as well. You decide to add this to your LabelImpl as well.

// in LabelImpl
def subst[F[_]](fs: F[String]): F[T]

// and in val Label
override def subst[F[_]](fs: F[String]) = fs

It’s interesting that you can use this to tag a whole List[String] in constant time:

scala> val taggedList = Label.subst(List("hello", "world"))
taggedList: List[Label.T] = List(hello, world)

It’s also interesting that you can use this to untag a whole list in constant time.

scala> Label.subst[Lambda[x => List[x] => List[String]]](identity)(taggedList)
res0: List[String] = List(hello, world)

Functions and typeclass instance can be tagged or untagged, too.

scala> Label.subst[Lambda[x => (x, Int) => x]](_ substring _)
res1: (Label.T, Int) => Label.T = $$Lambda$3194/964109489@72557d64

scala> import scalaz.Monoid, scalaz.std.string._

scala> Label.subst(Monoid[String])
res3: scalaz.Monoid[Label.T] = scalaz.std.StringInstances$stringInstance$@252798fe

All of this works because subst is really evidence that, deep down, String and Label are the same.

scala> import scalaz.Leibniz, Leibniz.{===, refl}

scala> Label.subst[String === ?](refl)
res4: Leibniz[Nothing,Any,String,Label.T] = scalaz.Leibniz$$anon$2@702af12c

Yet, you ran an experiment earlier to prove that you can’t confuse String and Label; indeed, this still holds true, despite the presence of subst!

scala> "still a string": Label
<console>:21: error: type mismatch;
 found   : String("still a string")
 required: hcavsc.subst.Labels.Label
    (which expands to)  hcavsc.subst.Labels.Label.T
       "still a string": Label
       ^

scala> Label("still a label"): String
<console>:21: error: type mismatch;
 found   : hcavsc.subst.Labels.Label.T
 required: String
       Label("still a label"): String
            ^

Here’s what’s happening: in a sense, (new Label(_)): (String => Label) and (_.str): (Label => String) witness that there’s a conversion between the two types. subst witnesses that there’s identical runtime representation between its own two types. You get to selectively reveal this evidence when it makes writing your program more convenient; the rest of the time, it is hidden.

But I would like to step one level up: this is a design space, and subst as we have seen it isn’t appropriate for all designs. As the author of your own abstract newtypes, you get to choose how much, if any, of this underlying type equality to reveal.

If subst is the right choice

For various reasons, the above is how Scalaz Tag (@@) is defined. If you wish these semantics, you might as well throw everything else away and write

sealed trait LabelTag // no instances
type Label = String @@ LabelTag
val Label = Tag.of[LabelTag]

and take advantage of the convenient tools around subst defined in Tag.Of. But it’s not the only choice! It’s one point in the design space. To do right by your API users, it’s worth exploring that design space a little more.

Type-unsafe code isn’t type-safe

Unboxed existential tagging spreads through your codebase. You feel free to apply it liberally, because you know you aren’t paying the wrapping costs of AnyVal subclasses; all these new abstraction layers are pure type-level, and fully erased.

You receive a “bug report” from a fellow developer that this expression never seems to filter out the non-label Strings.

(xs: List[Any]).collect{case t: Label => t}
<console>:16: warning: abstract type pattern
 hcavsc.translucent.Labels.Label.T
 (the underlying of hcavsc.translucent.Labels.Label)
 is unchecked since it is eliminated by erasure
       (xs: List[Any]).collect{case t: Label => t}
                                       ^
<console>:16: warning: The outer reference
 in this type test cannot be checked at run time.
       (xs: List[Any]).collect{case t: Label => t}
                                    ^

Your mind on safe pattern matching practice, you add def unapply(s: String): Option[T] to LabelImpl and counsel preference for the form case Label(t) => ..., as well as to not ignore -unchecked warnings.

You get another bug report that this always seems to succeed.

(s: String).asInstanceOf[Label]

Repeating your advice about warnings, you start to wonder, “where is this kind of code coming from?”

Someone else complains that they want to make T extends Ordered[T], and can’t fathom where the code should go. You advise the static approach of implementing the Ordering typeclass instance instead for T, wonder how deep the object-orientation hole goes, and forward the link about the typeclass pattern again, too.

Suppose you went back to AnyVal

We’ve seen that AnyVal subclasses could have been incredibly cheap, but aren’t, so as to support “features” like checked casting. Who’s going to foot the bill?

  1. Oh, this allocates when passing through polymorphic contexts, but not monomorphic ones? Avoid polymorphic code.
  2. Oh, this extra type-safety adds all this allocation? Type safety is expensive at runtime; we need to stick to String.
  3. We can’t do any better; the JVM limits the possibilities. You have to pay for runtime class wrapping if you want a wrapper type.

In this article, I have demonstrated that none of these conclusions are correct. However, only a tiny minority of Scala practitioners will ever read this article, and I will not blame the rest for drawing these seemingly straightforward inferences, ultimately faulty as they are.

The real cost of AnyVal subclasses is not all the needless memory allocation. The real cost is the damage to the practice of type-safe programming in Scala. It’s in all the curious developers who sought to add a little more type safety to their programs, only to find themselves penalized by the runtime, once bitten. It’s in the reinforcement of this attitude towards abstraction that they’ll continue to carry with them, the next time an opportunity presents itself. It’s a missed opportunity for pure type-level thinking, all so that asInstanceOf “works”.

See “…and the glorious subst to come” for further development of the ideas in this article.

This article was tested with Scala 2.12.1, Scalaz 7.2.10, and Kind Projector 0.9.3. The code is available in compilable form for your own experiments via Bazaar.

Saturday, December 3, 2016

Part 3: Working with the abstract F

This is the third of a four-part series on tagless-final effects.

Previously

  1. Introduction, motivation, and the core techniques;
  2. The role of Monad.

The freedom of erased abstraction

There is something supremely elegant about the way values of the F-type flow between effectful programs and their interpreters.

Consider the pair of copy and the IOFulFSAlg.

  1. The first F is created by IOFul; in fact, the copy method cannot create any on its own. It knows that F = Function0, so can use () => ... to create its result.
  2. This value flows to copy. But copy doesn’t know that F = Function0; the value is “actually” callable likeSo(), but that will not compile!
  3. copy passes it back to the interpreter’s bind. All of a sudden, callability is back, so it can be implemented!

Each time F values cross the boundary between effectful program and interpreter, this knowledge appears and disappears in exactly the way that guides us to keep effectful programs properly abstract, that is, agnostic to the representation of the effects.

The way that representation appears and disappears in just the right places is a hallmark of parametric polymorphism. By contrast, consider “hiding behind a class’s public interface”, a hallmark of the object-oriented polymorphic way of thinking:

  1. If the interpreter is embedded directly within the F class, then it can only safely work with exactly one F, the receiver. bind and such must be implemented with casting, which is by definition unsafe.
  2. If the interpreter is separate, it must cast every F.

Regardless, the presence of a class implies manual wrapping and unwrapping, to apply the Adapter pattern; in Scala, this is a one-time cost. Even AnyVal subclasses box quite readily.

We can observe that there is no runtime wrapper quite readily in Scala by doing unsafe casting on the results of algebra methods.

def unsafeRead(source: String, alg: FSAlg): String =
  alg.readFile(source)
     .asInstanceOf[() => String]
     .apply()

If we pass our IOFulFSAlg to this function, it will work! The F is really (!) what the interpreter thinks, a Function0.

However, if we pass the test interpreter, it will just crash. Effectful programs can only do this by cheating the interpreter out of its job; honest programs do not do this.

I explained all this to demonstrate that tagless-final relies on a purely type-level form of abstraction. It cannot be meaningfully enforced without a type checker with parametric polymorphism. If you do not have parametric polymorphism, it is difficult to say that abstraction is happening at all; it will certainly be extremely difficult for a programmer unversed in effect algebras to stick to the abstract interface, without the aid of enforcement.

In Scala, there’s a “hole” in this abstraction, demonstrated by the partially-checked cast to () => String above. Scala permits uncontrolled type tests in its pattern matching, not just those useful for ADT pattern matching, so it is also possible to use this to violate the abstraction even further.

def andInATest(alg: FSAlg): Boolean =
  alg.readFile("irrelevant") match {
    case _: Function0[_] => false
    case _: Function1[_, _] => true
  }

Parametricity does not let us determine more about F than that explicitly provided in FSAlg; the alg certainly did not supply information about how to break the abstraction.

This is why “runtime type information” or “reified generics” are neither benign nor harmless. I’ve lost the absolute guarantee that the effectful program isn’t breaking the purely type-level abstraction.

Luckily, the compiler doesn’t encourage this sort of thing either. In the Scalazzi Safe Scala Subset we take it back to a rule, by forbidding use of type tests. Thus, full abstraction is restored.

Is the type parameter really necessary?

The presence of a type parameter on the abstract type F—making F a “higher-kinded type”—gets in the way of implementing this in Java. Perhaps the best way to see why this type parameter is so important is to see a case where it is not.

Java programmers confronted with a constellation of methods that produce substrings will often, “for performance”, pass around a StringBuilder or Writer as an argument, changing all the functions into void-returning mutations.

Tagless-final style offers a far more elegant way to get this runtime optimization.

trait StrAlg {
  type S
  def fromString(str: String): S
  def append(l: S, r: S): S
}

object SBStrAlg extends StrAlg {
  type S = StringBuilder => Unit

  def fromString(str: String) =
    sb => sb.append(str)

  def append(l: S, r: S) =
    sb => {
      l(sb)
      r(sb)
    }
}

def numbers(alg: StrAlg): alg.S =
  (1 to 100).foldLeft(alg.fromString("")){
    (acc, m) => alg.append(acc, alg.fromString(m.toString))
  }

And so numbers is freed from the admonition not to iteratively concatenate Strings, even if you are too lazy to implement the more efficient interpreter later! We also have this nice fusion property: numbers is fully decoupled from what we do with its results, even if we arrange for fromString to write to an exotic output stream of some sort.

However, any S for a given interpreter is like any other S. There’s no behavioral way to distinguish between different sorts of S in our algebra. This is fine when we want to represent exactly one (stringish) thing, but a typical algebra needs more, and so does a typical effectful program.

Consider FSAlg. It returns two sort of results, F[String] and F[Unit], which is already one too many for the so-called “star-kinded” representation employed by StrAlg. Say we faked it with an empty string for the F[Unit] case.

How would you represent an effectful program that parses a List[Int] out of a file? With FSAlg, it is easy:

def listONums(source: String, alg: FSAlg): alg.F[List[Int]]

How would you get this list, without a type parameter? Well, you’d have to interpret F to a String. But now, this function that returns List[Int] runs the interpreter, so it cannot be used as a component of abstract effectful programs. It does not compose.

Higher-kinded types like FSAlg’s F are the foundation of the appeal and useful applicability of the tagless-final pattern. If we don’t have them, or we stubbornly refuse to use them, we’re doomed from the start.

CanBuildFrom has appeal to higher-kinded skeptics, but if you attempt to integrate something like it into FSAlg, yet still write signatures like listONums, you will never finish writing all the abstract types and map instances required to have a general-purpose algebra.

Is copy a functional program?

Suppose that we wrote a version of copy, or any effectful program, that directly referred to IOFulFSAlg to produce effects, rather than taking an algebra argument and leaving F abstract. It would be hard to argue that it is still a purely functional program. However, the case for its being functional is relatively simple in the abstract case. Since the only difference is taking an argument, why is that?

The usual way in which we make programs more functional is to divide a side-effecting program into two parts: one to make decisions and purely produce a value representing those decisions, and one to “interpret” that value. This forms an obvious, structural abstraction.

To accept copy as a pure function requires you to broaden your acceptance of abstraction to include the type level. Because copy does not only receive functions in an algebra as an argument, it also receives a type, F, as an argument. By means of this abstraction, we form an “effectful shell” of a shape that would not work without the ability to abstract at the type level.

On finally, on tagless

The pure type-level approach is why this is tagless. Other approaches to custom algebras, such as using free monads, require a runtime “tag” to be created and picked up by the interpreter.

In tagless final style, we skip the tag step and just have the interpreter emit the final form of the effect right away.

Drawback: decomposition required

One drawback of the tagless-final style is that it imposes a specific structure on the interpreters you write.

When you interpret a free monad structure, you have a few methods of interpretation available. One is “natural transformation”; this is similar to what you do with tagless-final, but with a chunk of boilerplate. However, you can also write the interpreter as a tail-recursive loop. This loop can conveniently do things like update state variables, notice when certain actions happen after certain other actions, and so on.

By contrast, tagless-final style requires you to take that interpretive logic and encode it in data structures, each returned by a method specific to that action. Each algebra method acts as an isolated component, with no relation to others that may be called in the same effectful program.

Luckily, while tagless-final requires you to have a uniform, functional representation of effects per interpreter, it doesn’t say anything else about what that structure is. So to the extent that you want the extra features of a free monad structure for interpretation power, you can incorporate one. Moreover, this remains invisible to the effectful programs themselves. The underlying style remains tagless; any tags present in the system are as you choose, for your interpreters’ convenience.

This article was tested with Scala 2.12.0.

Part 2: The role of Monad

This is the second of a four-part series on tagless-final effects.

Previously

  1. Introduction, motivation, and the core techniques.

The role of Monad

A useful effectful program needs some way of not only producing F effects, but combining and manipulating them. There are no methods on F from the perspective of effectful programs to satisfy this need; the interpreter must provide.

Suppose we have a basic copy method.

def copy(source: String, dest: String, alg: FSAlg)
    : alg.F[Unit] = {
  alg.readFile(source)
  // somehow get the String from F[String]...
  alg.writeFile(dest, contents)
}

It is tempting to include an evaluator in the algebra, but resist! Don’t include the runner in the algebra!

The appeal of this temptation lies with its similarity to an imperative style. You write

val contents = alg.run(alg.readFile(source))
alg.writeFile(dest, contents)

However, not only is this no longer functional programming, it is very, very hard to think about when this readFile side effect happens in relation to the other side effects in a side-effecting interpreter.

  1. It could happen before all of the F-controlled side effects.
  2. It could happen before one or more of the side effects you expect to happen first.
  3. it could happen after one or more of the side effects you expect to happen later.
  4. Any mix of the above can happen in the same interpreter.

Instead, we can supply combinators in the algebra to allow effects to be sequenced. Here’s a combinator for FSAlg that will allow copy to be written.

def bind[A, B](first: F[A])(next: A => F[B]): F[B]

This is called monadic bind, and can be written for both FSAlg interpreters.

// IOFul
def bind[A, B](first: () => A)(next: A => () => B): () => B =
  () => next(first())() // we can't call `first` now; that would
                        // break the delay of its side-effects

// TestMap
def bind[A, B](first: Directory => (Either[Err, A], Directory))
              (next: A => Directory => (Either[Err, B], Directory))
    : Directory => (Either[Err, B], Directory) =
  dir => {
    val (ea, dir2) = first(dir)
    ea match {
      case Left(err) => (Left(err), dir2)
      case Right(a) => next(a)(dir2)
    }
  }

With this function in the algebra, we can implement an effectful copy in a functional way.

def copy(source: String, dest: String, alg: FSAlg)
    : alg.F[Unit] =
  alg.bind(alg.readFile(source)){
    contents =>
      alg.writeFile(dest, contents)
  }

The broad applicability of this pattern to sequential effect problems—of both the pure and side sort, as exemplified by our two interpreters—is why Monad is so commonly used for problems like this.

Chances are that your effectful programs will need to perform F effects in this sequential way, where later effects (e.g. writeFile) need to be calculated based on the resulting values (e.g. contents) of earlier effects (e.g. readFile). So, as a design shortcut, you ought to incorporate Monad into your algebra.

Don’t reinvent the Monad wheel

It may be tempting to avoid incorporating a library of functional abstractions such as Scalaz or Cats into your program. This is a mistake; these libraries incorporate a large number of functions that are useful for working with abstract effects, as well as a large number of pre-built and tested implementations of bind and many similar combinators, ready for reuse in your interpreter.

These libraries are foundational because they cover so many common tasks for algebraic abstractions. For example, take a pure effectful program that reads a list of files.

def readFiles(names: List[String], alg: FSAlg)
    : alg.F[List[String]] =
  names.map(alg.readFile(_))
  //       ↑
  // [error] type mismatch;
  //  found   : List[alg.F[String]]
  //  required: alg.F[List[String]]

This doesn’t work because the Fs must be sequenced and returned from the method, not dropped on the floor. (foreach is completely useless in these programs for a similar reason.) An author of a pure effectful program can solve this problem herself, presuming the algebra includes the other essential monad function point.

// in algebra
def point[A](a: A): F[A]

// readFiles
  names.foldRight(alg.point(List.empty[String])){
    (name, rightF) =>
      alg.bind(alg.readFile(name)){hContents =>
      alg.bind(rightF){rest =>
        alg.point(hContents :: rest)
      }}}

With Scalaz, not only does abstract monad syntax give you a nicer way to write this fold and list reconstitution:

names.foldRight(List.empty[String].point[alg.F]){
  (name, rightF) =>
    for {
      hContents <- alg.readFile(name)
      rest <- rightF
    } yield hContents :: rest
}

But you wouldn’t bother, because the library already includes this function, for List and several other types.

names.traverse(alg.readFile(_))

Now all we did was change map to traverse.

Using a good foundational functional library is especially important for newcomers to monadic abstraction, because it contains in so many common patterns a demonstration of the proper way to work with F.

Monad reuse in FSAlg

Let’s rewrite what we have so far to incorporate a standard Monad into FSAlg.

First, we eliminate bind and point, substituting a Monad typeclass instance into FSAlg.

import scalaz.Monad

implicit val M: Monad[F]

For IOFulFSAlg, Scalaz already includes an implementation, which we can find by importing.

val M = {
  import scalaz.std.function._
  Monad[F]
}

Scalaz does have a Monad for the test algebra’s F, but using it will require some rewriting of our existing interpreter functions, so let’s just port over the previous bind implementation.

val M = new Monad[F] {
  // bind as above under TestMap
}

[error] object creation impossible, since method point in
 trait Applicative of type [A](a: => A)tfe.TestMapAlg.F[A]
 is not defined
  val M = new Monad[F] {
              ^

We didn’t get around to implementing point, the “effect-free” combinator, and the compiler asks for that now.

def point[A](a: => A): F[A] =
  d => (Right(a), d)

copy can be written in a method-calling style.

def copy(source: String, dest: String, alg: FSAlg)
    : alg.F[Unit] = {
  import alg.M.monadSyntax._
  alg.readFile(source).flatMap{
    contents => alg.writeFile(dest, contents)
  }
}

But it could also be written by calling alg.M.bind directly. Implementers’ choice.

For the traverse method, we need two imports, using the “à la carte” import style, and to pass along the Monad.

import scalaz.syntax.traverse._
import scalaz.std.list._

// and in the method
  names.traverse(alg.readFile(_))(alg.M)

(Use the type-parameter style for algebra definition to avoid this unfortunate failure of implicit resolution.)

Finding more functional combinators, like catching errors

In the type signatures for FSAlg, we haven’t really accounted for the fact that these functions can fail in real-world interpreters, and even in the test interpreter. Well, we have, in the design of the F choices, but that just delays any error until the caller runs the F.

  1. For IOFulFSAlg, calling the F function will throw, effectively halting the sequence.
  2. For TestDir, errors are represented with Left; reading the bind implementation, you can see that the Left case means that next is not called, effectively short-circuiting the program just like our IOFul does with exceptions.

This isn’t part of the tagless-final pattern; it’s a design choice. When we didn’t include error reporting in the return type of functions like writeFile that certainly can fail in practice, we implied that every interpreter’s F would account for errors. That’s not a convention, it’s an unavoidable outcome of this design decision.

If we want to write effectful programs that can handle errors, which is also a choice itself, we have a couple options.

1. The “explicit” strategy

For functions that can fail, include a representation of error cases inside the F. So FSAlg might have a different signature for readFile:

def readFile(name: String): F[Either[Err, String]]

This is the “explicit” strategy, and has a few major advantages:

  1. F can be simpler, because it need not model errors.
  2. The algebra can have a mix of failing and non-failing functions. The user of the algebra can tell which is which by looking at the return types.
  3. Effectful program authors can delineate which parts of their program may have unhandled errors.

2. The “implicit” strategy

Incorporate an error-recovery function to convert an error into something that can be handled. Choose an error type, such as E, and add such a function as this:

def catchError[A](mayFail: F[A], recover: E => F[A]): F[A]

Assuming that you have incorporated Monad or at least its weaker relative Functor into your algebra, as we have, this is precisely equivalent in power to

def catchError[A](mayFail: F[A]): F[Either[E, A]]

except that its similarity to the try/catch form is more obvious. Alternatively, you might provide for some kind of filtering, so you drop some errors but not others; perhaps recover might be a PartialFunction, or you might take an additional argument that somehow explains to the interpreter which errors you want to handle or let go.

You may also wish to include the equivalent of try/finally, bracket:

def bracket[A](first: F[A], cleanup: F[Unit]): F[A]

This “implicit” strategy has its own set of advantages:

  1. The potential for failure need not be noted on each algebra method; it is assumed.
  2. Effectful programs can allow errors to percolate up “automatically”, so to speak. (Doing this with the “explicit” variant is possible, but a little tricky.

Unfortunately, there is no way to type-check that an effectful program in the “implicit” style has handled all errors, because F, no matter what, represents a program that might fail in this design.

3. The “someone else’s problem” strategy

As with #2, but don’t provide any means of recovery.

This is a question of delineated responsibility. For many effectful programs, it simply isn’t meaningful to recover from errors originating in the interpreter, and it’s always more appropriate for them to be handled by the invoker of the program, as an “early termination”.

In such a situation, you can communicate this by leaving error-catching out of the interpreter. Though you might want to at least document that you intended to leave the functionality out, and didn’t simply forget it!

None of these strategies is more or less pure; they all preserve the purity of effectful programs.

There is a broader problem here, though: how can you find type signatures like catchError, that convert concepts that seem to require side effects or special support, into plain algebra calls that work for pure FP programs? One great resource is the Haskell base library. Haskell requires all programs, even effectful ones, to be written using only pure constructs and ordinary functions, so many such problems have been solved there. catchError comes from the MonadError typeclass, which supplies a mini-algebra much like FSAlg, but specifically for throwing and catching errors.

  1. Break an “effectful” idea down into a primitive concept you’d like to support.
  2. Research how this is handled in the IO algebra for Haskell.
  3. Replace IO with F and incorporate into your algebra.

Here are the implementations of catchError and bracket for our two interpreters. One test for your choice of effectful API is whether interpreters can implement it. I’ve chosen the Err type to represent errors to effectful programs, but the choice is yours.

import scala.util.control.NonFatal

// IOFulFSAlg
def catchError[A](mayFail: F[A],        // () => A
                  recover: Err => F[A]) // Err => () => A
    : F[A] =
  () => try mayFail() catch {
    case NonFatal(e) => recover(Err(e.getMessage))()
  }

// TestDirFSAlg
def catchError[A](mayFail: F[A],
                  recover: Err => F[A]): F[A] =
  dir => {
    val (result, dir2) = mayFail(dir)
    result match {
      case Left(err) => recover(err)(dir2)
      case Right(a) => (Right(a), dir2)
    }
  }

// IOFul
def bracket[A](first: F[A], cleanup: F[Unit]): F[A] =
  () => try first() finally cleanup()

// TestDir
def bracket[A](first: F[A], cleanup: F[Unit]): F[A] =
  dir => {
    val (result, dir2) = first(dir)
    val (_, dir3) = cleanup(dir2)
    (result, dir3)
  }

When should I take an F argument?

The functions so far follow a pattern that is common for algebra that include Monad. Specifically, our algebra API comes in two flavors:

  1. Specific functions like readFile that carry out some task specific to this domain; these return an F but do not take an F as an argument.
  2. Abstract effect combinators like map, bind, catchError that likewise return an F but also take one or more as arguments.

This pattern arises because for functions like readFile, this is the most useful signature in the presence of Monad.

With Monad in place, we can easily implement copy’s writing step in terms of flatMap or bind. Without it, we might be tempted to solve the problem of calling writeFile by adding an F argument.

// don't do this
def writeFileBad(filename: String, contents: F[String]): F[Unit]

Now we can call writeFileBad directly with the result of readFile. But what about these?

  1. Suppose we want to process the contents of the source file before writing them. Maybe we want to split in two lines and filter some out (like grep), or sort them?
  2. Suppose we want to read two or more files, writing all of their contents to the target file?
  3. Suppose we wanted to read a file that contains, itself, a list of filenames, and we want to concatenate all of the contents of those files and put them into the target file?

The redesigned writeFileBad is good for only one sort of thing: things like copy. Monad is so ubiquitous partly because it is flexible enough to solve all these combination problems, and many more besides.

Effectful programs can all split their demands on their algebras into wanting to call these two sorts of primitive algebra functions; a program calling writeFileBad can call writeFile and flatMap instead, and will be better for it. Learning to recognize when you’ve accidentally given a specific function the job of a generic combinator is a highly useful skill for the design of abstract APIs.

Bending the mind the right way

The most difficult part of learning to use this pattern is learning how to design usable function types for your effect algebras. I suggested earlier looking at IO in Haskell, because they’ve encountered and solved such problems many times, because they had to.

So this is an attitude worth adopting. Demand a purely-functional approach in your effectful programs. That so many purely functional effect types are widely known is a testament to the unwillingness to compromise the integrity of the Haskell model or abandon the reasoning power that comes with referential transparency.

It’s a good idea to have two interpreters, one like IOFul and one like TestMap, or at least to imagine both. When adding a new function, think

  1. Will using this function in effectful programs cause effects before “running” the returned F?
  2. If I use this with a side-effect-free interpreter, will running F have side effects?

If the answer to either of these is “yes”, change the type signature.

Still to come

  1. Working with the abstract F;
  2. How much is this “dependency injection”?

This article was tested with Scala 2.12.0 and Scalaz 7.2.8.

Tagless final effects à la Ermine Writers

This is the first of a four-part series on tagless-final effects.

“Finally tagless” notation is a nice way to get many of the benefits of free monads without paying so dearly in allocation of steps.

Watching John DeGoes’s 2015 presentation of free applicatives, it struck me that I didn’t quite like the notation of “finally tagless” demonstrated therein. To me, threading an algebra compares unfavorably to having an algebra in the program’s scope. The latter is the approach taken by the implementation of the Ermine Writers.

I think the style of effectful programs written like this will be appealing to programmers transitioning from out-of-control side effects to functional programming. By avoiding the sequence of intermediate command data structures that characterizes free monad effects, it saves significant runtime cost; the implementation of the interpreter should also be more obvious to the newcomer. On the other hand, it preserves the idea of multiple interpreter-dependent output types, and with it the testability benefit.

While this style does away with the abstract command structures, it preserves the limitations on available effects provided by good effect libraries. It does this by means of type-level abstraction rather than by means of a specific effect structure. This means that the abstraction is enforced, but erased; the concrete structures are the same as the interpreter output, though the code choosing effects can’t tell what that is.

There’s no library for me to mandate, or that you have to adopt. I recommend that you have a library like Scalaz or Cats with Monad, IO (for interpreters), and related functionality available, but it’s not a requirement. The code involved in adopting this style is specific to your use case.

While this is a good alternative to free monads, eff, and the like, it integrates well with them, too. You can combine these effects with other systems as convenient, either to implement interpreters or to produce effects within effect-abstract code, especially if you incorporate Monad.

How can this all be accomplished? With higher-kinded types, that is, abstraction over type constructors.

Declaring an algebra

As with other designs for constrained effects, you need to declare the various effects you’re going to support. For the sake of a little exoticism, I’m going to declare a simple filesystem interface.

trait FSAlg {
  // I'm using an abstract type constructor (i.e. FSAlg
  // is higher-kinded) for the effect type, but this
  // converts readily to a type parameter F[_] on FSAlg,
  // like Scanner in Ermine
  type F[A]

  def listDirectory(pathname: String): F[List[String]]

  def readFile(pathname: String): F[String]

  def writeFile(pathname: String, contents: String): F[Unit]
}

Writing an effectful program

Instances of FSAlg supply concrete operations for the F type constructor, and so must choose a concrete F as well. Concrete programs that choose which effects to perform should be abstract in F under this design approach.

Instances of FSAlg can be defined in typeclass style (in which case you should use a type parameter for F instead of a type member), or passed as normal arguments.

The other thing ‘effectful programs’ do in this style is return an F; specifically, the F associated with the algebra instance being passed in. In this way, this style radically departs from “dependency injection” or “the strategy pattern”—the “dependency” has a concrete influence on the public return type.

Organizing the program as a set of standalone methods makes it easy to use path-dependent style to return the correct type of effect, when using a type member.

def write42(alg: FSAlg)(pathname: String): alg.F[Unit] =
  alg.writeFile(pathname, "42")

A typeclass version would look more like

def write42[F[_]](pathname: String)(implicit alg: FSAlg[F]): F[Unit] =
  alg.writeFile(pathname, "42")

To avoid passing around the alg everywhere, you might put a whole group of methods under a class, and have the class take the algebra as a constructor parameter. This is a straightforward translation for the typeclass or simple type parameter approach.

class MyProg[F[_]](implicit alg: FSAlg[F]) {
  // several methods using F and alg
}

This is the organization style of Ermine Writers; each individual Writer class (e.g. HTMLWriter) is similar to MyProg.

Doing this with a type member is a little trickier; if you simple put a alg: FSAlg argument in the constructor, you’ll “forget” the F, existential-style. You can either put a bounded type parameter on the class for the alg:

class MyProg[Alg <: FSAlg](alg: Alg)

or a higher-kinded parameter, via the Aux pattern.

// object FSAlg
  type Aux[F0[_]] = FSAlg {type F[X] = F0[X]}

// replacing MyProg
class MyProg[F[_]](alg: FSAlg.Aux[F])

I think the latter yields easier-to-understand method types and type errors, but all of the above alternatives have equal power. So choose whatever seems nice, and change it later if you like.

Writing an interpreter

When writing the effectful program, you’re condemned to be free: you have to choose the effects to perform. The “interpreter”, which “executes” the effect, is more of a guided exercise. You must extend the algebra trait, FSAlg, implementing all of the abstract members.

object IOFulFSAlg extends FSAlg {
  import java.io.File, java.nio.file.{Files, Paths}

  type F[A] = () => A  // your choice!

  def listDirectory(pathname: String): () => List[String] =
    () => new File(pathname).list().toList

  def readFile(pathname: String): () => String =
    () => new String(Files readAllBytes (Paths get pathname))

  def writeFile(pathname: String, contents: String): () => Unit =
    () => Files write (Paths get pathname, contents.getBytes)
}

The key is to choose an F type—you can almost consider it an implementation detail of the class—that will allow you to implement the methods without side-effecting when they are called. I’ve made a good starter choice above, but the real magic happens when I choose more interesting Fs.

A test interpreter

With F abstract in “real” programs, we can choose different ones for different interpreters. Here we use one that allows simulation of the algebra methods without performing any side effects or mutation.

final case class Directory(
  listing: Map[String, Either[String, Directory]])

final case class Err(msg: String)

object TestMapAlg extends FSAlg {
  type F[A] = Directory => (Either[Err, A], Directory)

  private def splitPath(p: String) =
    p.split('/').toList

  private def readLocation(dir: Directory, p: List[String])
      : Option[Either[String, Directory]] =
    p match {
      case List() => Some(Right(dir))
      case k +: ks =>
        dir.listing get k flatMap {
          case r@Left(_) =>
            if (ks.isEmpty) None else Some(r)
          case Right(subd) => readLocation(subd, ks)
        }
    }

  def listDirectory(p: String): F[List[String]] =
    dir => (readLocation(dir, splitPath(p)) match {
              case None => Left(Err(s"No such file or directory $p"))
              case Some(Left(_)) =>
                Left(Err(s"$p is not a directory"))
              case Some(Right(Directory(m))) => Right(m.keys.toList)
            }, dir)

  def readFile(pathname: String): F[String] =
    dir => (readLocation(dir, splitPath(pathname)) match {
              case None => Left(Err(s"No such file or directory $pathname"))
              case Some(Right(_)) =>
                Left(Err(s"$pathname is a directory"))
              case Some(Left(c)) => Right(c)
            }, dir)

  def writeFile(pathname: String, contents: String): F[Unit] =
    dir => {
      def rec(subdir: Directory, path: List[String]): Either[Err, Directory] = 
        path match {
          case List(filename) => 
            Right(Directory(subdir.listing + ((filename, Left(contents)))))
          case dirname +: subpath =>
            val subsubdir = subdir.listing get dirname
            subsubdir match {
              case Some(Left(_)) =>
                Left(Err(s"$dirname is not a directory"))
              case Some(Right(d)) =>
                rec(d, subpath)
              case None =>
                rec(Directory(Map()), subpath)
            }
        }
      rec(dir, splitPath(pathname)) match {
        case Left(e) => (Left(e), dir)
        case Right(newdir) => (Right(()), newdir)
      }
    }
}

Executing the effects

The implementations of effectful programs in this scheme can’t tell what F is. But the code that chooses the interpreter and passes it to that abstract program does know. Accordingly, the type returned by invoking the program will change according to the F type of the interpreter you pass in.

scala> write42(IOFulFSAlg)("hello.txt")
res1: IOFulFSAlg.F[Unit] = <function0>

scala> res1()
// hello.txt appears on my disk. Guess what's in it?

scala> write42(TestMapAlg)("hello.txt")
res2: TestMapAlg.F[Unit] = <function1>

scala> res2(Directory(Map()))
res4: (Either[Err,Unit], Directory) =
  (Right(()),Directory(Map(hello.txt -> Left(42))))

As the invoker of the interpreter, this very concrete level—the first truly concrete segments of code I’ve shown in this post—is responsible for supplying the “execution environment”. It’s here that side effects—if any!—happen. For the first example, we can invoke the zero-argument function and watch the side effects happen. In the second case, we can make up a test Directory and inspect the resulting tuple for the final Directory state, and error if any.

Otherwise, the usual rules of the interpreter pattern apply; by inventing new instances of FSAlg, we can choose different things that should happen in the effects of the various algebra methods.

Effects must be delayed

You may be tempted to use an F like this:

type F[A] = A

and then do something like the IOFul implementation without the leading () =>. This will seem to work, but effectively prevent effectful programs from doing functional programming.

We can see why via a simple counterexample. Consider this simple program,

readFile("hello.txt")
// ...
writeFile("hello.txt", "33")

According to the rules of FP, the below program must always do the same thing as the above program.

val wf = writeFile("hello.txt", "33")
readFile("hello.txt")
// ...
wf

If calling writeFile performs a side effect right away, this will not hold true.

For a similar reason, naive memoization of the side effects’ results will also break FP. Consider this program:

readFile("hello.txt")
// ...
writeFile("hello.txt", "33")
// ...
readFile("hello.txt")

In FP, I can factor these two readFile calls to a val. If readFile memoizes with a local variable, though, the second use of that val gives the wrong file contents. (Of course, if you don’t have any effects in your algebra that can change the results of later effects, this is no problem!)

Otherwise, you don’t have to be wildly principled about the purity of your interpreters’ F choices, while still granting the benefits of pure FP to your effectful programs. Ermine writers’ interpreters use something like

type F[A] = java.sql.Connection => A

and it’s perfectly fine.

Relaxed rules in the interpreter from abstraction in the program

“Local mutation” is a broadly accepted way to implement pure functions. Even Haskell supports it, still without breaking the rules of the language, via the ST abstraction (covered in chapter 14 of Functional Programming in Scala). This is usually taken to refer strictly to mutation local to a certain dynamic scope, though; the universal quantification trick in ST is precisely meant to enforce the dynamic scope of mutable variables.

The Ermine Writers show us a different story, though. When using a concrete type for hiding side effects, like IO, you must be very careful to hide the runner, lest the side-effect be exposed.

Type-level abstraction such as in tagless-final changes both of these parts of the typical ‘local mutation’ method of design.

  1. Instead of being dynamically scoped, control over side effects is statically, or lexically scoped, to the code in the interpreter. This lends a new dimension to the idea of a side-effecting “shell” for a pure program—the “shell” is syntactic, not based on the patterns of function calls at runtime.
  2. The only care required to not break a type-level abstraction is to not pass something that would break it in the algebra, and to follow the Scalazzi Safe Scala Subset and avoid use of reified type information. (This will be discussed in more detail in part 3, “Working with the abstract F.)

The degree to which you can “break the rules” in your interpreter is directly proportional to the degree to which you enforce abstraction in effectful programs. Following the approach of examples in this article, there remains a great deal of freedom to experiment with interpreters that exploit your favorite mutation techniques to speed up interpretation. For example, a less naive memoization of readFile and listDirectory is admissible under the current algebra.

By contrast, if you expose too much detail about the F functor to effectful programs, then your interpreter becomes severely constrained. Suppose you define in the abstract algebra

def asReader[A](fa: F[A]): () => A

This may be expedient, but effectively demands that every interpreter works like IOFul; others cannot be safely implemented.

Still to come

  1. The role of Monad;
  2. Working with the abstract F;
  3. How much is this “dependency injection”?

Also, Adelbert Chang is covering “Monadic EDSLs in Scala” in a series over on Typelevel blog; he’s taking a different route to many of the same ideas as this series. I suggest checking out both his series and this one to find the most comfortable route for you.

This article was tested with Scala 2.12.0.