Thursday, March 29, 2012

With a type system, whether you can write a program depends on whether you can prove its correctness.

The trouble with bad type systems is that you have to use “escape hatches” pretty frequently. In C and many of its derivatives, these take the form of casts, which, in type terms, are like saying “I can't prove this even a little bit, but trust me, it's true. Try it a few times and you'll see.” The traditional cast system is very broad, as it must be to account for the shortcomings of lesser type systems. No one wants to reimplement the collection classes for every element type they might like to use.

After being so impressed by the power of Haskell's inference, many people next discover that they can't put values of different types into lists.¹ Well, that's not quite true, you can always chain 2-tuples together, but that's not what you really mean. Well, what did you mean?

Oh, well, I meant that sometimes the elements of my list are integers, and sometimes they're strings.

Okay, no problem. Put the integer type and string type together as alternatives in a single type:
     data IntOrString = AnInt Int | AString String
     [AnInt 42, AString "hi"] -- has type [IntOrString] 
No, I meant that it's alternating integers and strings, one of each for the other.

Well why didn't you say so!
     data IntAndString = IntAndString Int String
     [IntAndString 42 "hi"] -- has type [IntAndString] 
You can't just stick integers and strings together in a list without proving something about what you mean. To write any program typefully, you have to prove that it sort of makes sense. In the former example, you really meant that each element could be either one, and you have to prove that it's one of those, and not, say, a map, before you can put it in the list. In the latter example, you have to prove that you have exactly one string for each integer that you put into the list.

This permits a more analytical approach to programming than can occur in latent-typed systems. Let's say you had the [IntOrString], and you realized it was wrong and changed it to [IntAndString] in one module. You have two other modules that are trying to use the lists, and now they don't work, because you didn't prove that you had one string for each integer in those modules. Now nothing loads, and you have to jump around for a bit fixing your proofs until you can test again. This separates the task into two phases: one where you're only thinking about and testing the proofs, and the other where you're thinking about and testing the behavior.

I don't think this is an unqualified improvement over the latent-typed situation. On one hand, breaking tasks down into little bits is the foundation of human software development. Moreover, this example clearly helped us to clarify what we meant about the structure we were building. On the other hand, sometimes I prefer to focus on getting one module right on both type and runtime levels before moving on to the next. This is harder to do with most typeful programming languages, as type errors naturally cascade, and types both described and inferred usually influence runtime behavior.


[1] Haskell also has escape hatches, but using them is viewed rather like gratuitous use of eval is by Lispers. Whereas most C and Java programs use casts, very few Haskell programs use Data.Dynamic, just as very few OCaml programs use Obj.magic.

No comments: