Sunday, October 27, 2013

A Brief Ceremony

Warning: This post makes use of unconventional notation. If you believe that the One True Meaning of the ASCII characters was revealed for all time to Saints Kernighan and Ritchie, or if you are otherwise firmly attached to the syntax of a particular programming language, read at risk of disgust and disgruntlement.

As I mentioned previously, I've been learning Standard ML as part of Dan Grossman's Coursera course. As usual when one starts to pick up a new language, ML seems remarkably clean and simple—and, as usual, I'm not sure whether that's because it is all that clean and simple, or whether I'm just learning the easy parts first, because that's generally where you start.

The parts of ML we've covered so far in class are focused on the functional basics: declaring a function, invoking a function, passing a function to another function—in short, the lambda calculus. And contemplating the lambda calculus in its many guises is always as pleasurable as watching fluffy clouds drift by in the summer sky.

But one thing that strikes me about ML is that its take on these basics is a bit verbose compared with other languages I've seen. For example:

(* ML *)
let val a = b
    val c = d
in  e
seems a lot wordier than Scala's:
// Scala
val a = b
val c = d
as well as requiring deeper indentation. And fn to introduce an anonymous function has a lot more visual weight than Haskell's simple backslash.

So all this got me to wondering: just how terse a notation can I come up with for declaring and applying functions? ML and Haskell already use whitespace as the application operator—and that's zero characters if you're not counting whitespace, so I can't imagine how you'd do better.

How about function declaration? All the languages I've ever encountered have distinct syntax for declaring a named function and binding an anonymous function to a name. For example, in Scala you have:

def foo (a: Int, b: Int) = bar
val foo = (a: Int, b: Int) => bar
and similarly in ML (currying rather than tupling, to keep up with the fashionable):
fun foo a b = bar
val foo = fn a => fn b => bar
Is there a way to make an analog to the val versions above so simple that the def/fun forms are unnecessary? If so, binding an identifier needs to be pretty concise. The best way I can think of to approach this is to think of the binding operator as an infix operator, as in a = b, with no introductory let or val. However, the equals sign usually has a pretty important role to play in comparisons, and I'd rather not overload it. Ideal would be ≡, which is Unicode character hex 2261 (“identical to”), but the gods of ASCII did not see fit to endow us with that character, and for regrettably many tools, venturing outside of 7-bit territory is still a leap into the mojibake abyss. So I've provisionally settled on:
a ~ b
which defines the identifier a to have the value of the expression b in whatever follows, up to the next scope-closing delimiter.

The next step in overthrowing the tyranny of the fun keyword is an infix lambda operator (about which the author of the Arcane Sentiment blog has done some philosophizing). Although most of the languages I know require some sort of introductory prefix when declaring an anonymous function, in Scala you can get away with:

a => b
provided that the type inference winds are blowing your way.

However, a two-character infix lambda looks pretty hefty to me, particularly if you're going to chain it in a curried function declaration like:

a => b => c => d
On the same basis, I reject /\ (even though it looks like a capital lambda). In fact, even single-character choices with a vague A-frame resemblance to the letter lambda seem to clamor for too much attention when you lay them out:
a \ b \ c \ d    -- does this look like quoted spaces to you?
a / b / c / d    -- would you do this to your division operator?
a % b % c % d    -- a lambda with a bug squashed on it?
What I'm really looking for is something as lightweight as the series comma used when declaring C-style tupled parameters. And then it occurred to me—the comma itself is up for grabs, as in:
a, b, c, d
So now we can turn something like:
val foo = fn a => fn b => bar
foo ~ a, b, bar

There remains one more thicket of verbosity to hack through before we can pose beneath the Mission Accomplished banner, and that is pattern matching. Pattern matching is a sign of civilization as surely as the motorcar and the wireless, and I'd hate to try to present any examples longer than half a line without it. Pattern matching syntax is one of ML's less attractive features, as it requires not one but two keywords to get going:

case foo of
    pat1 => expr1
  | pat2 => expr2
  | patn => exprn
Scala sort of starts out improving on that by having an infix match keyword:
foo match {
  case pat1 => expr1
  case pat2 => expr2
  case patn => exprn
but then you've got the curly braces and the repeated case keyword, and it all goes to hell.

One way to approach the patterns you're matching is to treat each as a function parameter (oh, didn't I mention that all function parameters are patterns? No, because you already knew that). So given a value to match and a sequence of functions, the pattern matcher's task is to find the first function in the sequence that would be legal to call with the given value as an argument. Indeed, you might say that your sequence of functions is itself a function, and the value to match is its argument, and that's precisely how we'll do it:

f ~ f1 | f2 | ... | fn
is the function that first tries to call f1, then if that doesn't work, f2, and so on up to fn. So we've reduced the above Scala and ML examples to:
(pat1, expr1 | pat2, expr2 | ... | patn, expr2) foo

Now we're just about far enough along to fold something leftwards like:

foldl ~ f, z, ([L], z | a :: as, foldl f (f z a) as)
except that the L in [L] needs some excusing. ASCII just doesn't have enough matching delimiters. At a minimum, you'll want convenient groupers for lists and tuples. And if you start doing anything useful, you'll quickly want shorthand for sets and maps as well. Parentheses and curly braces are off limits, because I'm giving them their usual Scala meaning (mutatis mutandis), and less than and greater than are comparison operators (this isn't XML). That leaves an awful lot of work for the square brackets, so I'm letting the first token after the [ disambiguate, as in:
[T "This" "is" "a" "tuple"]
[L a list]
[S 1 2 3 1]                           -- a set of *three* elements
[M key1 -> value1 key2 -> value2]     -- and a map
As you can see, within square brackets, whitespace is treated as an item separator rather than function application (infix operators are still recognized). You can always parenthesize to call your favorite function:
[T (cos x + sin y) (sin x + cos y)]
And if you're really insecure about having lost your tupling comma, you can use double comma as an infix operator to make a synonym for the above like:
(cos x + sin y),,(sin x + cos y)

One last thought before putting this post to bed. It would be a wonderful universe if we could dispense entirely with a built-in if then else in favor of pattern matching, but even:

(true, expr1 | false, expr2) cond
is awfully talky (not to mention backwards), and I don't see how to trim it down without betraying my logic. So, infixally once more:
cond ? expr1 ! expr2
where exclamation mark takes over for C's colon. The colon is reserved for specifying types, about which, mercifully, I have given no thought whatsoever.


  1. For a real comparison, you would need curly braces to denote a code block in Scala. Otherwise the scope of a and c extends beyond their use in e. Still less verbose, true.

    1. Yes, I'm assuming that within curly braces (and at the top level), there's a Scala-like semicolon inference at work, that semicolons can separate function declarations, and that the scope of a function parameter extends only to the end of the function. In short, all the hand-wavy just-make-it-work stuff. I certainly haven't verified that those assumptions are consistent or reasonable. This is a thought experiment, not a language spec.

  2. Very nice dissertation, but I don't agree on the usefulness of shortness/compactness over (a more verbose) clarity of intention.
    e.g. I would argue that having (a,b) for tuples makes for a more expressive and readable language than ,,

    on the other hand I agree that a character set as limited as ASCII is really a restriction that we could do better without, at least for those with scientific/math background like me


    1. What's clear depends a lot on what you're used to. But at least in my experience, clarity also correlates highly with compactness. Could I get used to this syntax enough to consider it superior to the next-most-verbose language I know? Maybe. I'm always psyched about something right after I write about it, so right now this syntax is looking pretty good. Whether it will be so appealing when I sober up is an open question.

      As for comma-comma vs. comma, certainly the former is a bit heavier than the usual tuple syntax (although you can sometimes omit parentheses around the tuple, a win by one character for a pair and a wash for a triple). But little functions are ubiquitous in many programming styles, perhaps even more common than tuples. Someone should undertake a cost-benefit analysis.

      Given the paucity of ASCII characters, there's no reasonable way to build something out of them that looks good and is faithful to the notation of any well-established domain. Life is just tradeoff after tradeoff.

    2. I meant "next-most-concise", not "next-most-verbose".