I previously wrote “Values never change types”, whose central thesis statement I hope is obvious. And this still holds, but there is something I left unsaid: values do not have identity, so the notion of “mutating” them is as nonsensical as “mutating 4”. And the formal system of Scala types treats objects with identity similarly, by not permitting them or their variable aliases to change type, even though they are not quite values. But this is a design decision, and other choices could have been made.
There are very good reasons not to make other choices, though. Other type systems come with features that come very close to making the opposite design choice; by imagining that they went just a little farther down this garden path, we can see what might have been.
Refinement, flow, or occurrence typing, by any name
In Flow and TypeScript, when you test properties of a value in
a variable, you can “change the type” of that variable. For example,
you could have a let s: any
; if you write an if
block that tests
whether s
is a string, the type of s
—at compile-time, mind
you—“changes” to string
within the if
body. Within the body of
that if
, you could perform further tests to refine s
’s type
further; you might also have various other if
blocks alongside
checking for other types, so that s
might variously “change” into
number, function, object with a whatsit
property whose type is
another object with a whosit
property, and so on.
So, instead of having a single type attached to a lexical variable over its entire scope, a variable has several types, each tied to a block of code that uses the variable. It is an order more sophisticated, but still tied to the lexical structure of the program, as if the variable has multiplied to honor all the faces it might have. This is a great way to model how people are writing programs at the type level without overly complicating the formal system, which still must always obey a complete set of sound rules.
Contradictory refinement
In the systems I’ve described, no further refinement can contradict a
prior one. So once you determine that a variable is a string
, it’s
not going to turn out to be a number later; at most, it can get more
specific, like being proven to be a member of a known static set of
strings. So this way you know that inner blocks cannot know less than
outer blocks about the nature of a variable; that is what I mean by
“tied to the lexical structure.”
What “real JavaScript code” could be written that would violate this assumption?
function foo(arg) { let s = arg if (typeof s === 'string') { // refines s to type string s = {teaTime: s} // "point of no return" if (s.teaTime === "anytime") { drinkTea(s) ...
The first if
test establishes a block in which s
’s type is
string
. Then we pull the rug out from under the type-checker by
assigning to s
; with that assignment, it is no longer true that s
is a string. Why does this make type-checking more complex?
Let’s twist and tangle the program to support our beloved mutation
The type of the variable s
no longer follows the block structure of
the program, in the way we usually perceive blocks in a structured
program. That’s because the fact established by the outer if
test is
suddenly invalidated partway through the block. So our first problem
is one of raising the complexity burden on human interpretation of the
program—the reader can no longer assume that the specificity of a
variable’s type only increases as you move inward, reading the block
structure of the program—but it is not fatal in itself, at least for
this example. We can salvage the model via the Swiss Army knife of
semantic analysis, the continuation-passing style (CPS) transform.
function foo(arg, k0) { return (s => if (typeof s === 'string') { (_ => if (s.teaTime === "anytime") { drinkTea(s, k0) } )(s = (teaTime: s)) } )(arg)
Now it is still possible for inner blocks to contradict outer blocks, but at least this is only possible at the block level. So, by “merely” revisualizing our programs in terms of the flow of continuations rather than the visibly apparent block structure, we can sort of still think of the type of variables as a “block”-level concern, as it was before.
Unluckily, performing a CPS transform in your head with all the code you see is a kind of “reverse Turing Test”, something that an AI would have to not be able to do very well in order to fool us into thinking it was human. So no matter what, we are stuck with a significant new complication in our formal model.
But not a fatal one. Yet.
Loop unrolling according to the phase of the moon
What will prove fatal to the formal model of type-changing mutation is delay. Let us see how the seeds of our destruction have already been sown.
while (forSomeTimeNow()) { log(s.substring(1)) if (itsAFullMoon()) { s = {teaTime: s} } drinkTea(s) }
The first question, and the last, is “is it safe to drinkTea
?”
One necessary prerequisite is that it has been a full moon. I’m using
this boolean expression to inject the Halting Problem—we cannot
determine in a Turing-complete language whether a boolean expression
will evaluate to true or false, generally—but it is probably
sufficient to say that it is nondeterministic, even if not
Turing-complete. (Pragmatists love Turing completeness and
nondeterminism, because “more power is always better”.) So it’s hard
enough—by which I mean generally impossible—to say whether the s
assignment has happened.
The next prerequisite, which should drive us wholly into despair now
if hope yet remains, is that the moon has been full once. Eh? Here’s
where the tie of variable types to any semblance of code structure
breaks down completely, because s
takes on a surprisingly large
number of types in this code sample.
To assign a precise type to this program, we have to accurately model
what is happening in it. So, suppose that prior to entering this loop,
the type of s
is string
. Each assignment to s
—made each time the
while
’s test is true and the moon is full—takes us one step down
this list of types.
- string
- {teaTime: string}
- {teaTime: {teaTime: string}}
- {teaTime: {teaTime: {teaTime: string}}}
- {teaTime: {teaTime: {teaTime: {teaTime: string}}}}
- •••
Now if you want to assign (potentially) all of these infinite
possibilities to the program, you have to go even further from the
block structure model. Imagine a third dimension of the program text: at
the surface, you see s
having only the first two types above, but as
you look deeper, you see the branching possibilities—oh so many
iterations in, oh so many times the moon has been full—each assigning
different types to what is on the surface the same code. Looking at
this as two-dimensional text, you would only see the infinite
superimposition of all possible types of s
, weighted according to
their probability.
Three dimensions might be too few for this code.
Of course, there’s a well-known, sensible way to type this code, sans the
log
call: abandon the folly of modeling mutation and assign this recursive union type tos
for at least the whole scope of thewhile
loop, if not an even wider scope:
type TeaTimeTower = string | {teaTime: TeaTimeTower}
And supposing the
drinkTea
function is so polymorphic, all is well, and as a neat side bonus, easy to understand. But we aren’t here to pursue sanity; we gave that up to try to model mutation.
The devil in the delay
If fully desugared, while
is a two-argument (not counting still
thinking in CPS) higher-order function, taking test
functions as
arguments. Just like you’re writing Smalltalk.
while(() => forSomeTimeNow(), () => { log(s.substring(1)) if(itsAFullMoon(), () => s = {teaTime: s}) drinkTea(s) })
The thing that makes so much trouble for flow analysis is this delay. Type-changing requires us to contradict earlier refinements of a variable’s type, not simply refine them further. But the ability to capture a reference to a variable in a lambda means that we need a deep understanding of how that lambda will be used. It might never be invoked. It might be invoked later in the function, just when we thought it was safe to contradict whatever refinement it was type-checked with. It might be saved off in another variable or data structure elsewhere in the program, making reasoning about when the variable might be referenced in the future a futile endeavor.
Doing flow analysis with sources of delayed execution whose behavior
is 100% known, like if
and for
, is tricky enough. Doing it in the
presence of unknown, novel, potentially nondeterministic sources
of delay is intractable, if not impossible.
And that’s for the computer. How many dimensions does the model in your head have, now? Zero, no, negative points for abandoning this ivory-tower static analysis and declaring “common sense” the arbiter of your programs’ correctness.
Did anyone else see you come here?
An object with known properties can be thought of as a group of named variables. This is a longtime, straightforward way to represent modules of functions, or clean up a global namespace by putting a lot of related items laying on the floor into the same drawer.
Since we love mutation, and we love mutating variables, we should love mutating object properties (and their types) even more, right?
function schedule(s) { s.teaTime = "anytime" }
The type of s
in the caller after schedule
finishes is
straightforward: it’s whatever it was before, with the teaTime
field (whatever it might have been before, if anything) type set to
string, or perhaps the literal singleton type "anytime"
.
But what schedule
is so eager to forget will not be so easily
forgotten by the rest of the program.
Namely, the contradicted, earlier type of s
is very hard to
reliably eradicate. This is an aliasing problem, and it brings
the excitement of data races in shared-mutable-state multithreaded
programs to the seemingly prosaic JavaScript execution model.
To type s
in the aftermath of schedule
, you must perfectly answer
the question, “who has a reference to s
”? Suppose that teaTime
was
a now-contradicted function. Any code that calls that function via
property lookup on its reference now takes on another dimension:
before schedule
executes, it is safe, but afterwards it no longer
is, so it takes on the prerequisite “can only be called before calling
schedule(s)
.” The dimensional multiplication directly results from
the multiplication of possible types for s
.
The problem broadens virally when you try to model other variables
that are not s
, but whose types will still change due to
schedule
being called! Here is an example of such a variable.
const drinks = {coffee: coffeeMod; tea: s} // where s is the value we’re talking about
So all the analysis of references to s
induced by the type mutation
means references to drinks
must undergo the same ordeal. And
references to something that refers to drinks
, and references to
something that refers to that, and so on, ad infinitum.
And that is assuming we can statically determine what object identities will be flying around the program. As with so much else in this article, this is generally impossible.
By the way, the problem with lambdas is just a special case of this one; it’s exactly that lambdas alias variables that causes so much grief for our wannabe-mutating ventures.
A different kind of power, at a much more reasonable price
Since we are only imagining this insanity, not attempting to truly partake of it, we have something good to feel about, after all: the grass is really quite brown on the other side, alas, but that on this side is a little greener than first glance might indicate.
Type systems like those of Haskell, Java, OCaml, Scala, and many other languages simply don’t permit the types of variables to change. When you consider the introduction of type equalities in Haskell or Scala GADTs, or more direct refinements made in Typed Racket, Flow, or TypeScript; you can include all of these languages in the broader category of type systems whose variable types can only complement, never contradict.
This is a powerful simplifying assumption, because under this
restriction, none of the above problems matter. “Functional types”
are not only powerful enough to model programs, and far easier to
understand for the human programmer, they are the only way out of the
quagmire of complexity wrought by trying to “model mutation”. Even
problems that almost look amenable to mutation analysis, like the while
example above, admit a simpler solution in an immutable type like the
recursive TeaTimeTower
.
More power is sometimes worse.
When you forbid unneeded capabilities, you get back capabilities in other areas. Sometimes this comes in the form of improved understanding, such as we get for large programs by introducing type restrictions. It makes sense to give up “power” that is not practical to get benefits that are.
Take the forbidding of mutation. We take type-level immutability for granted in the same way that many practitioners take value-level mutability for granted. Perhaps one reason for resistance to functional programming might be that we are so accustomed to the drawbacks of unconstrained mutability that it does not seem quite as insane at the value level as, seen above, it is at the type level.
But, familiarity cannot make the insane any less so.
This article was tested with Flow 0.57.2 and TypeScript 2.5.
No comments:
Post a Comment