Saturday, September 6, 2014

Code Dependent More

The more I learn about the dependently typed lambda calculus, the cooler it seems. You can use it to describe refinement types (aka subset types), which are about as expressive as a type system can get; you can prove theorems (see chapter 8 of the Idris tutorial); you can even, if the Homotopy Type Theory people have their way, use it to build an entirely new foundation for mathematics.

However, I think dependent types need a bit of a marketing makeover. They have a reputation for being abstruse, to say the least. I mean, have you even read the dependent types chapters in Benjamin Pierce's Types and Programming Languages, long the bible of the aspiring type theorist? No, of course you haven't, because those chapters weren't written (although dependent types do merit a whole section in Chapter 30). The dependently typed calculus is apparently so arcane that even the place you go to find out everything about types won't say much about them.

(Not that Pierce is in any way a slacker; his more recent Software Foundations is all about the use of dependent types in the automated proof assistant Coq.)

Although following all the mathematics to which dependent types lead takes a good deal of sophistication (a lot more than I've got), and the implementation choices facing a programming language designer who wants to use dependent types are considerable (let's sweep those under a Very Large Rug), a description of what the dependently typed lambda calculus is is not particularly difficult to grasp for anyone who has used a statically typed programming language.

But then, of course, there's the word “dependent”, which pop psychology has not endowed with the most favorable of connotations. (I'd rather be addicted to coding than addicted to codeine, but still.) Surely more programmers would satisfy their curiosity about more expressive type systems if the dependently typed lambda calculus were just called something else (and while we're at it, preferably something shorter).

So What Is It?

If you haven't seen the dependently typed lambda calculus before, then you might want to start at the beginning, with the untyped lambda calculus (which, if you've used any kind of conventional programming language, you already sort of know, even if you don't know you know it). In the concise notation favored by programming language geeks, the grammar for the untyped calculus looks like:

     e  :=      v        a variable
λv.e a function with parameter v and body e
e1 e2 the function e1 applied to the argument e2
built-in value whatever you want: true, 42, etc.

This may look too succinct to make anything out of, but it just means that the untyped lambda calculus is a language that contains variables, functions, function calls, and built-in values. Actually, you can leave out the built-in values (I put them in blue so you can distinguish the pricey add-ons from the base model)—you'll still have a Turing-complete language—but the built-in values (along with parentheses, which you can use to group things) make it feel more like a Lisp dialect, maybe one where curried functions are the norm.

If you're a fan of static types you'd probably prefer a variant of the lambda calculus that has those, and the simplest way to do that is named (surprise!) the simply typed lambda calculus. It has a more complicated grammar than the untyped calculus, because it divides expressions into two categories: terms, which evaluate to a value at runtime, and for which I'll use the letter e; and types, which constrain terms in the usual statically typed way, and which get the letter t:

     e  :=      v        a variable
λvt.e a function with parameter v of type t and body e
e1 e2 the function e1 applied to the argument e2
built-in value whatever you want: true, 42, etc.
 
     t  :=      t1t2        a function type, where t1 is the input type and t2 is the output
built-in type whatever you want: Bool, Int, etc.

You'll notice (because it's bright red) that I added a type annotation (: t) as the only change to the term syntax from the untyped lambda calculus. Types can be either function types or built-in types.

This isn't a very sophisticated type system, and you'd hate trying to write any real code in it. Without more add-ons, we can't get correct types out of it for things like product types (think tuples) or sum types (disjoint unions).

Historically, and in most textbooks, the simply typed lambda calculus is augmented with a bunch of special-purpose gadgetry (like those missing product and sum types), and then followed by several progressively more powerful type systems, like the parametric polymorphism that is at the core of languages like Haskell and Scala.

Boooooring!

Instead, let's just skip to the good bits. The dependently typed lambda calculus is not only more powerful than everything we're skipping over; it's also got a shorter grammar:

     e  :=      v        a variable
λve1.e2 a function with parameter v of type e1 and body e2
e1 e2 the function e1 applied to the argument e2
e1 → e2 a nondependent function type, where e1 is the input type and e2 is the output
(ve1) → e2 a dependent function type, where v, the input, has type e1, and e2 is the output type (which can mention v)
* the type of types
built-in values and types whatever you want: true, 42, Bool, Int, etc.

But whoa, what happened to the types? Unlike the simply typed calculus, there's no t in this grammar! Well, the big thing about the dependently typed lambda calculus (aside from the dependent function types) is that types are themselves values (the OO folks would say they are reified), and type expressions may involve arbitrary computation. In short, types are just terms!

Everyone's Favorite Example

So what can you actually do with the dependently typed lambda calculus? Well, the traditional (practically mandatory) example is one of a list (let's follow convention and call it a Vec) that has its length encoded in its type.

(This is not really the most exciting thing you can do with dependent types. It's actually possible to encode natural numbers in the type system of Scala or Haskell, so you can already implement Vec in either of those languages. But the two languages' type systems are oriented towards doing typical type-level stuff, so numbers-as-types are pretty awkward—it turns out that regular numbers are still the most convenient way of representing numbers.)

You can specify the type of Vec in the dependently typed lambda calculus as:

    Vec: * → Nat → *

where Nat is the type of natural numbers. That is, a Vec is parameterized by the type of its elements (remember that * is the type of types) and the number of its elements—Vec is actually a function that, given those arguments, returns a type.

The constructor for the empty Vec, Nil, has type:

    Nil: (A: *) → Vec A 0

That is, given a type A, Nil gives you back a Vec of that type with zero elements. You'll remember that we said Vec itself is a function that takes two arguments and returns a type—so the type of Nil is indeed a type, even though it looks like a (plain old, value-level) function call. (If this is the first time you're seeing this, I hope that's as much of a rush for you as it was for me.)

It's clearly not fair for me to leave with you with just Vecs of zero elements, so here's the type of Cons, which constructs a nonempty Vec:

    Cons: (A: *) → (n: Nat) → A → (Vec A n) → (Vec A (n + 1))

That is, you feed Cons the element type A and the size n of the old list you are prepending to, along with an A and the old list, and you get back a new list with size n + 1.

Back to the Whinging

All this technical exposition is fine (although I've left out a whole lot of stuff, like how you actually execute programs written in these calculi), but I really came here to complain, not to explain.

When talking about the polysyllabically-named dependently typed lambda calculus, I have to keep using the terms “depedent function type” and “nondependent function type” (hardly anyone says “independent”), which further pushes up the syllable count. By the time I name what I'm talking about, I've forgotten what I was going to say.

I like to think of nondependent function types (say, A → B) as cake types, because the arrow (like frosting) separates two layers (A and B) with distinct, relatively homogeneous structure, like cake layers. By extension, dependent function types (say, (xA → b x)) are pie types: the variable x gets mixed from the left side of the arrow into the right, making the whole type chunky with xs, kind of like the bits of apple in an apple pie. So how about calling the whole calculus the pie calculus?

(Now, it may have occurred to you that you don't really need to have two different function types. Wouldn't a cake type be a degenerate version of a pie type, one that didn't happen to mention its variable on the right side of the arrow? Well, yes, you're right—the cake is a lie, albeit a little white lie. But the upshot is that the pie calculus is even simpler than I've made it out to be, because you can drop the production for the cake type from the grammar.)

As it happens, there are a couple of alternate notations for dependent function types. Instead of:

    (x: A) → B

you can say:

    ∀x: A. B

or:

    Πx: A. B

Hey, that last convention makes it look as if the pie calculus is actually the Π calculus! But no, that would make life too easy. It turns out that the pi calculus is already something else, namely a formalism for describing concurrent computations. It's related to the lambda calculus, so no end of confusion would ensue if you tried to hijack the name.

So I guess I'll have to keep calling my favorite calculus by its mind-numbingly long name. Life is unfair, etc., etc.

And So Can You!

You can mitigate some of life's unfairness by mastering the dependently typed lambda calculus yourself. Check out Coq (the aforementioned Coq-based Software Foundations is probably the easiest way to get into dependent types), or Idris, or Agda, or even F* (or, for a quite different take on what it means to implement the calculus, Sage), and start amazing your friends with your new superpowers!

3 comments:

  1. Nice introduction for newbies in dependently typed languages (like myself). I hope you continue with more posts digging deeper into this fascinating subject. Dependent typing really is the future of functional programming (and programming in general), with languages like Idris evolving nicely and even making it's way into Haskell (and also Scala and ML variants to some extent). One area I'm really excited about is the integration of theorem provers with type systems (like in LiquidHaskell and F*). That brings the practicality of using dependent types to whole new level.

    ReplyDelete
    Replies
    1. If I understand correctly, theorem provers pretty much fall out of any dependently typed language that (1) does all typechecking before runtime (unlike, say, Sage), and (2) insists on total functions. Under those conditions, the Curry-Howard correspondence guarantees that the typechecker acts as a proof checker. However, that might not be the same as practicality; putting the burden of proving complex type claims on the programmer instead of the compiler makes coding a lot more work (although also a lot more fun).

      Delete
    2. I guess so (I'm no expert on the dependent type theory). What I meant was that the type checkers in Idris, Agda etc. are really simple theorem provers. Basically you have to write all the proofs yourself as functions, which is really tedious (even using tactics). Using an SMT solver with builtin support for linear arithmetic etc. a lot of things can be proven automatically. Some of the usage examples of LiquidHaskell are really cool.

      Delete