Tuesday, April 16, 2013

Sugar-Free Loops

Sugar-free loops: not a breakfast cereal.

I've recently been playing with a minimalist language that eschews pretty much all syntactic sugar. It has no classes (or typeclasses), no mutable state, no jumps or labels, no modules or packages. It has no libraries—and if it did, they'd be useless, because it has no identifiers with which to name the contents of the libraries.

The language is basically the lambda calculus, plus built-in integers, a few integer functions, and a conditional If. Variables are represented as zero-based De Bruijn indices, using the notation V(i) for the variable in the ith enclosing scope. V is actually a special case of the Var operator, whose argument can be an arbitrary expression rather than just an integer constant—useful if you want to invent arrays.

Calling this thing a “language” is probably an abuse of the term; it's just a data structure with an interpreter.

Although I wouldn't want to program such a thing on a regular basis, I did want to be able to process sequences of varying lengths with it. Given the pure functional nature of the language, this requires recursion. And given the lack of identifiers, the language doesn't do the recursion for me; I need to use an explicit fixpoint combinator. The call-by-value Y combinator in this language looks like:

  fix =
    Lam(
      Lam(V(1)(Lam(V(1)(V(1))(V(0)))))(
        Lam(V(1)(Lam(V(1)(V(1))(V(0)))))))

So I know I'll apply this to my looping function, but what does the looping function look like? Well, you can generally view a loop that computes a value as a fold. But unlike folds in high-level languages, there aren't have any prepackaged sequence operations to use in the fold, so I have to take those operations as parameters. Specifically, my fold (as passed to the Y comnbinator to create a usable recursive fold) will look like:

  λf.λmore?.λhead.λtail.λstep.λaccum.λnext.
    if not more? next then
      accum
    else
      f more? head tail step (step accum (head next)) (tail next)

where more? applied to a sequence tells whether there are any more elements (the sequence is not empty), head provides the current element of the sequence and tail the rest, step performs one step of the fold operation, accum is the value accumulated by the fold, and next is the sequence from which to get the next element. Stripped of sugar, this becomes:

  unfixedFold =
    Lam(                              // f (fold itself): V(6)
      Lam(                            // more: V(5)
        Lam(                          // head: V(4)
          Lam(                        // tail: V(3)
            Lam(                      // step: V(2)
              Lam(                    // accum: V(1)
                Lam(                  // next: V(0)
                  If(
                    Not(V(5)(V(0))),  // not more?
                    V(1),             // no more, return accum
                    V(6)(             // else apply f to
                      V(5))(V(4))(V(3))(V(2))(
                                      // same more?, head, tail, step,
                      V(2)(V(1))(V(4)(V(0))))(
                                      // step(accum,head(next)),
                      V(3)(V(0)))))))))))
                                      // and tail(next)

The only form of input in this language is to bind the input to variables, as if the program were called from within a series of nested functions applied to the inputs. A program to iterate through such inputs and compute the maximum value looks like:

  fix(unfixedFold)(
    Lam(Lt(V(0),Minus(Depth,I(1)))))(   // more?: variable index less
                                        // than lexical depth
    Lam(Var(Plus(V(0),I(1)))))(         // head: index into inputs (add
                                        // 1 to index to account for
                                        // head function itself)
    Lam(Plus(V(0),I(1))))(              // tail: increment index
    Lam(If(Gt(V(0),V(1)),V(0),V(1))))(  // step: max of arguments
    V(0))(                              // accum starts as zeroth input
    I(1))                               // index after zeroth input

Four out of five dentists do not recommend sugar-free languages, at least for their patients without patience.

Sunday, April 7, 2013

No Comment

Of all the features ever introduced into programming languages, surely the ability to insert natural-language comments is the worst thought out.

Think about it: have you ever seen a comment of more than one line that wasn't out-of-date by the time you read it? Or, if the comment was more-or-less accurate, one that didn't at least contain distracting misspellings and mispunctuations?

An ideal programming language would let you write all comments in the language itself, so that the compiler could verify that comments are properly structured and consistent with the code they describe.

Now I realize this post is appearing just a few days after April 1st, but I'm not entirely joking. Static type checking is, in effect, a system of compiler-checked comments. Type checking, of course, has a role in avoiding errors—and I do find that I do make type errors that the compiler flags, and I appreciate the ability to correct those errors before runtime. But far more often than the compiler finds a type error in my code, I read the type annotations I've made, because from day to day I tend to forget the types I'm working with at any given point in the code. That is, type annotations—like pretty much everything else in programming language source code—are best thought of as more for human consumption than for the compiler.

One reason I enjoy working with Scala more than Haskell is precisely what many people consider to be a flaw in Scala, namely that Scala can't do type inference in many places where Haskell can. Scala code is consequently littered with type annotations in places where Haskell would omit them—but that means I don't have to look very far to be reminded what anything is. As with natural language, redundancy may take up more space or time in transmission, but if not overdone it's a great aid to comprehension.

I've written some Python, and I enjoy the speed and concision with which you can whip up some code when you don't have to bother with type annotations. But it's not a language I would use for code I planned to work on for a long time, or to share with other people; I don't think I could keep the types clear in my head long enough. Unless, of course, I wrote some natural-language comments to remind me. But then I'd really want the compiler to make sure those comments didn't get out of date.

I'm still looking forward to the compiler that will complain when I write a comment that says “here is an implementation of quicksort” when what I've actually coded is bubblesort. But perhaps, a few years from now, someone will announce such a thing. On April 1st.