Monday, October 3, 2011

Yes, I Have No Class

OK, I think I do kind of get what's going on with Beta. Yes, function activations are objects, and function bodies are object initializers. (I can find references to this being a concept that Beta inherited from Simula, but what I haven't found is any actual Simula documentation that says Simula does things this way. As far as I can see, Simula has rather conventional-looking, distinct syntaxes for declaring classes and procedures. But Beta has just one syntax for both.)

(There's some other stuff going on in Beta, like the ability to keep a function activation around and rerun it in the same activation frame. This is excusable as an optimization of tail recursion, but letting people do it by hand at arbitrary points in the program is a perfect example of the principle that just because you can think of something, you are not obliged to go off and implement it.)

Not Your Grandpappy's this

To illustrate the activation-as-object idea (without resorting to Beta syntax, which is baffling to the uninitiated), here's a bowdlerized version that I hope won't be too offensive to modern functional sensibilities. Consider a representation of binary expressions:

def Binary (
public left: Expr,
public op: BinOp,
public right: Expr) =
this
using the usual ill-defined mongrel syntax. In this case, this is understood not to mean what it would in Java or Scala (the object that owns the function where this appears), but rather the current invocation (or call or activation) of the Binary function itself. So Binary is the name of a class. It's also the name of a function that constructs an instance of the class. We can tell Binary is a class because it returns this. (We could also make the return type of Binary explicit, as in:

def Binary (
public left: Expr,
public op: BinOp,
public right: Expr): Binary =
this
but my imaginary compiler for this non-language is somehow smart enough to infer this.)

A class with no features is not very useful. Binary has three features, all functions (because the universe contains nothing except functions and activations of functions. Of course, the functions themselves must be activations of the function constructor function. Does my head hurt enough yet?). I've marked these functions (left, op, and right) with public, not the more Scalaish val, to emphasize that I'm specifying only the visibility of these features, not that they are variables (even constant variables) in the usual sense—because there are no variables in the usual sense, only functions that always return the same value. Things that look like variable bindings are just functions defined always to return the argument (a function activation) passed to them.

A fatter example:

def Binary (
public left: Expr,
public op: BinOp,
public right: Expr) = {
def eval: Int =
left.eval mumble mumble right.eval
def show: String =
"(" + left.show + " " + op.show + " " + right.show + ")"
this
}
where I suppose the defs should actually be publics for consistency, but this train of thought is still way too half-baked to merit consistency (or unmixed metaphors). Since eval doesn't return an eval, and show doesn't return a show, we know these aren't constructors; they're just plain old functions.

Calling a plain old function, like calling a constructor function, allocates an activation record for the function (at least conceptually). But if the activation never escapes, it becomes garbage as soon as the function returns. An execution environment with a lick of sense would put such a thing on a stack instead of a heap, but let's loftily dismiss that as a mere implementation detail.

Found: One Grail (Slightly Holy)

Anyway, yes, activation-as-object does offer a solution to the puzzle of constructors.

To recap, that puzzle is: what is an object before its constructor is called? Bytecode fans will recall that the JVM constructs an object in two phases, first allocating it with the new instruction, and then invokespecialing its class's <init> method. The JVM verifier ensures that you can't get a hold of the object in between new and <init>, because it's not valid then: it's not a proper instance of its class yet, but (because new had to refer to the class to allocate it), it's clearly not an instance of anything else. It's just a nonthing from which you must avert your eyes.

But function invocation, at least as far as we let ourselves know, is atomic. Once the arguments are prepared, boom!, you're executing the function, with its activation record all nicely laid out for you, and with not so much as an eyeblink in between (assembly language programmers may dissent, but our fingers are firmly in our ears and we can't hear a word they're saying). That means that when objects are activations, there is no object to wonder about before its constructor is called: the object is the call to the constructor.

Further Adventures of a Vague Idea

In this non-language that I'm not describing, every identifier that appears in an expression (as opposed to a type declaration) is the name of a function. If that identifier is followed by an open paren or nothing, it's a function invocation. So left.eval above invokes left, which returns the first argument passed to Binary, and then invokes the eval function that is a feature of left (and presumably of every Expr). An identifier followed by an underscore is a reference to the function itself, as in Scala.

Since classes are functions, Expr _ is properly the name of the type of instances of (activations of, calls to) Expr. But after a colon, the trailing underscore is implied, so we can say left: Expr instead of left: Expr _. You might need the underscore in other contexts, like if (foo.class == Expr _) ....

Another random observation is that in Scala, val and var have different meanings in classes and functions. Declarations in a class can refer to each other (so that functions can recurse, and so on); but variable definitions in a function can refer only to local identifiers declared earlier in the function, not later. Since Non-Language functions are both classes and functions, you'll need some kind of syntax to distinguish between letrec-like declarations and sequential declarations, unless you can persuade yourself that you can do without the latter (can you? I never thought about this).

Hey, Look at the Ducks!

Please distract yourself immediately, lest you come up with questions about how this scheme might relate to inheritance, or parametric types, or anything else complicated. If you must, consider these an exercise for the reader (or for Gilad Bracha, whose Newspeak also borrows ideas, but thankfully not syntax, from Beta).