Thursday, March 29, 2012

With a type system, whether you can write a program depends on whether you can prove its correctness.

The trouble with bad type systems is that you have to use “escape hatches” pretty frequently. In C and many of its derivatives, these take the form of casts, which, in type terms, are like saying “I can't prove this even a little bit, but trust me, it's true. Try it a few times and you'll see.” The traditional cast system is very broad, as it must be to account for the shortcomings of lesser type systems. No one wants to reimplement the collection classes for every element type they might like to use.

After being so impressed by the power of Haskell's inference, many people next discover that they can't put values of different types into lists.¹ Well, that's not quite true, you can always chain 2-tuples together, but that's not what you really mean. Well, what did you mean?

Oh, well, I meant that sometimes the elements of my list are integers, and sometimes they're strings.

Okay, no problem. Put the integer type and string type together as alternatives in a single type:
     data IntOrString = AnInt Int | AString String
     [AnInt 42, AString "hi"] -- has type [IntOrString] 
No, I meant that it's alternating integers and strings, one of each for the other.

Well why didn't you say so!
     data IntAndString = IntAndString Int String
     [IntAndString 42 "hi"] -- has type [IntAndString] 
You can't just stick integers and strings together in a list without proving something about what you mean. To write any program typefully, you have to prove that it sort of makes sense. In the former example, you really meant that each element could be either one, and you have to prove that it's one of those, and not, say, a map, before you can put it in the list. In the latter example, you have to prove that you have exactly one string for each integer that you put into the list.

This permits a more analytical approach to programming than can occur in latent-typed systems. Let's say you had the [IntOrString], and you realized it was wrong and changed it to [IntAndString] in one module. You have two other modules that are trying to use the lists, and now they don't work, because you didn't prove that you had one string for each integer in those modules. Now nothing loads, and you have to jump around for a bit fixing your proofs until you can test again. This separates the task into two phases: one where you're only thinking about and testing the proofs, and the other where you're thinking about and testing the behavior.

I don't think this is an unqualified improvement over the latent-typed situation. On one hand, breaking tasks down into little bits is the foundation of human software development. Moreover, this example clearly helped us to clarify what we meant about the structure we were building. On the other hand, sometimes I prefer to focus on getting one module right on both type and runtime levels before moving on to the next. This is harder to do with most typeful programming languages, as type errors naturally cascade, and types both described and inferred usually influence runtime behavior.

[1] Haskell also has escape hatches, but using them is viewed rather like gratuitous use of eval is by Lispers. Whereas most C and Java programs use casts, very few Haskell programs use Data.Dynamic, just as very few OCaml programs use Obj.magic.

Sunday, March 25, 2012

C and Java do not have good type systems.

You know how, once you learn Scheme, or Common Lisp, the idea of a language not providing lambda expressions and still somehow being good is just absurd? There are similar things I discovered about type systems when learning Haskell, as in, “it's just absurd that anyone thinks a type system without this feature is good.”

The exact features I'll describe later. But if your opinion of type systems is based on the really popular ones, know that those are missing the features in question. To be more direct, C and Java don't have “good” type systems.

Just be wary of forming an opinion of, say, the Haskell type system, based on the very severe limitations of something else.

Friday, March 23, 2012

Type systems help with “a class of errors”: factual, but not true.

A common characterization of type systems is to say they help with “a certain class of errors”. This is literally true, but usually serves to obscure the real issues. Perhaps I can help reveal them.

In one way, this is meant to say “hey, your code may pass the type checker, but that doesn't mean it's bug-free”. This is probably something many people who have never used latent typing need to hear and internalize; as I previously said in bold, type systems partially prove correctness. Untested code may still crash or give wrong results, no matter how wonderful the type checker says it is. Type checking cannot replace testing, and perhaps it may never entirely.

But saying “a certain class of errors” is usually meant to imply that this is just an itty-bitty area of erroneous programming that's going to be found and squashed in testing anyway. This leads to bad thinking in two ways:

First, limited to “a certain class” though it may be, the type system prevents errors in that area. As previously stated, that's in a whole other universe than what testing can provide.

Secondly, and more importantly, it ignores that once you have a powerful type system, you can design your programs so that they pull more information about the program into the type system, so that the typer can eliminate more errors.

Take a type for pain relief

For example, consider the ubiquitous java.lang.NullPointerException and its relatives in other programming environments. Irritating, shows up everywhere, often too far away from the real cause of the error for the backtrace to be a great deal of help, and has a nasty habit of not popping up until real users try your program.

Haskell users just laugh at this, because they pulled the idea of a missing object into the type system:

     data Maybe a = Nothing | Just a 

Here | means “or”, and a is the type we're wrapping up in a Maybe. Now, when an operation can fail, everyone using that operation has to propagate or suppress the failure using either explicit matching or the (very powerful, but more on that later) higher-order functions provided for the purpose.

Much of typeful programming is thinking of ways to use the features of the type system to expand the range of things that can be automatically proven about the program. What you see in C and Java are poor examples of that, as I will briefly touch on next.

Tuesday, March 20, 2012

Optimization is just a side benefit.

A corrolary to our previous definition of type systems as principally ways to prove correctness: many discussions programmers have about type systems seem to focus on the runtime cost of type checking, as if that were the primary issue. Common Lisp has a great answer to that in its optional type declaration system¹ and the powerful inference and inlining done by compilers such as CMU Python.²

To a typeful programmer, runtime optimizations are an interesting side-effect of types, but ultimately beside the point. The correctness-proving features have a much greater impact on how you think about data structure, algorithm, and API design, approaches to testing, etc.

Common Lisp's type declarations are very nice, but they are little help in broadly proving program correctness, which is mostly the point of typeful programming.

① The special form the offers a glimpse into this system.

② See the CMUCL FAQ question on Python for explanation.

Sunday, March 18, 2012

Type systems prove things about your programs.

Typeful programmers don't seem to see types the way Common Lispers or C programmers do. Instead, type systems are how you can partially prove the correctness of your program.

Thinking of types as proof systems offers several useful intuitive analogies. It is not just more rigorous testing; it's in a whole other universe from testing. Scientifically, we don't accept a hypothesis that seemed to work when we tried it with a few different things, we demand proof, which is qualitatively different.

Taking the time to prove your program's correctness may take more time than simply writing the program in many cases. (This is a complex issue I will return to later.) However, even in those cases, the benefit of the rigor of proof may outweigh the cost.

It may be that, when you are writing a program, you can intuitively see that your program is somewhat correct, but you don't know how to prove it. Well, if you can't prove it, should you, or anyone else, really trust your intuition? If you do prove it, no trust is necessary; all you need to do is trust the proving framework, the type system, to disallow nonsense, and the fact that the type system passed the program is enough to satisfy everyone that the proof is sound, so far as it goes.

Finally, type inference is “proof assistance”. Typeful compilers come with both proof checkers and proof assistance, but these, again, deserves separate posts.

Accordingly, I will often use the language of “proof” when talking about using type systems.

A Lisper's perspective on “new hotness” typeful programming languages

A few years ago, I didn't care about functional programming. A year ago, it would have been unthinkable for me to work with a bunch of type system fanatics. I've been a Lisper for a long time, very much disliking anything breaking the Lisp mold, and it was just a little over a year ago that I learned, and came to appreciate, Haskell and the advantages it provides.

This is how I perceive the modern world of typeful programming, as a programmer who most naturally thinks in Lisp, with some experience in that area and a relatively newfound appreciation for it.

I won't discuss the benefits of immutable, persistent data structures, except where it naturally comes up in discussion of types. Clojure demonstrates that you don't need fancy type systems to get many of the advantages of immutable data. Were I to take on a large project in Common Lisp now, I would certainly try to use Funds or some implementation of Okasaki's data structures wherever beneficial, but I don't care to discuss that.

I will be similarly mute on the benefits of Haskell-flavored normal-order evaluation, for similar reasons. They will come up more often in relation to type systems, though.

I've broken up the discussion into simple assertions and discussion of them. You won't find rigorous logical backing of my statements, merely the empirical and anecdotal, because the former has already been done quite well elsewhere, and won't be as fun for me.

Here I keep links to articles already posted. The unlinked articles have been written, but not posted yet, for the sake of cadence.

  1. Type systems prove things about your programs.
  2. Optimization is just a side benefit.
  3. Type systems help with “a class of errors”: factual, but not true.
  4. C and Java do not have good type systems.
  5. With a type system, whether you can write a program depends on whether you can prove its correctness.
  6. “Inference” is “proof assistance”.
  7. Some type systems are much better than others.
  8. More type system features can make proofs harder to write.

Since last time

I'm going to diverge briefly from my usual focus on technical matters. Since my last posting here, several things have changed:
  1. I moved to Fort Lauderdale, FL to work for Velocitude, a company building generalized mobile web support in Clojure, a dialect of Lisp with special support for programming with immutable data structures, concurrency, and other functional programming goodness.
  2. Velocitude was bought by Akamai, and I moved to the Boston area to continue with the same work.
  3. Back in, oh, January or February 2011, I took the time to learn Haskell properly, including how to think in its type system.
  4. I joined S&P Capital IQ a couple months ago, to work with some hardcore type system fans in Scala.

I just wanted to provide some context for what follows, and will dwell no more on such matters.