Wednesday, July 27, 2011

Sticking to Mutability

In our last episode, we left Minty Fresh Construction lying tied to the tracks with the Locomotive of Hopeless Complexity bearing down on her at full steam. Well, Minty, you're just going to have to hang on for a bit, because we have more characters to introduce to the story. Maybe we'll see you again after the plot has thickened enough.

Immutability's Evil Stepsiblings

We all know that Mutability Is Bad. Bad to the bone. The language construct that dare not speak its name (at least in some languages). Or at any rate, those of us with a properly strict functional upbringing know this; there are still heathens out there who write Java (or even Scala or C++) code containing single equal signs without the least hint of shame.

Shudder.

Not talked about in the Best Families is the fact that the languages we all know and love are inclined, in the depths of night with the shades drawn, to treat the contents of memory as a big mutable array whose contents they can toy with as they please. An even darker secret is that the designers of libraries for ostensibly immutable data structures sometimes hide mutable variables in their cellars (at least temporarily, like Scala Builders).

Of course, there have been attempts to dress mutability up in civilized clothing and bring it into Polite Society. We've seen enough of Haskell's by-need evaluation and Scala's lazy vals that they rarely elicit anything more than mild titters.

Dare we ask ourselves now if there any more forms of mutability out there, longing to discard their black hats and shave their twirly mustaches, ready to submit themselves to the conventions of decent society and become upstanding, productive citizens?

We dare.

Lazy References

Lazy vals and by-need parameters can be understood as the combination of a mutable reference cell, a flag, and a nullary function. The flag tells whether the cell has been initialized; if set, the cell can be examined. If not set, any attempt to examine the cell invokes the function, puts the resulting value in the cell, sets the flag, and returns the value:

class LazyRef[T] (private var f: () => T) {
private var initialized = false
private var value: T = null.asInstanceOf[T]
def get: T =
synchronized {
if (! initialized) {
value = f()
initialized = true
f = null // lest f become garbage
}
value
}
}

There's mutability here, all right, but it's a very disciplined sort of mutability that can mingle with the inhabits of Immutableland without causing the least alarm. value can be changed, but only once, and you can never observe it before it gets changed, so it's as good as immutable if you overlook the peculiar circumstances of its birth—er, conventions of its initialization.

Compulsive optimizers may observe that a straightforward implementation of LazyRef can take more space than necessary. At any given moment, either value or f is of interest, but never both at the same time. You might reimagine LazyRef as:

class LazyRef[T] (private var funOrVal: Either[() => T,T]) {
def get: T =
synchronized {
funOrVal match {
case Right(v) =>
v
case Left(f) =>
val v = f()
funOrVal = Right(v)
v
}
}
}

Whether this version of LazyRef is any faster or more compact than the previous version depends on your execution environment. The point is, though, that because this construct is so useful, you're likely to make heavy use of it, and attention (from the language or VM) to make it more efficient might be well repaid.

Sticky References

Lazy Val has a more rakish cousin I call a sticky reference, or stickref for short. Stickref is mostly safe around Respectable Society, but you'll want to ensure that he's properly chaperoned whenever he's in the presence of impressionable young algorithms.

Stickref differs from Lazy Val in that he doesn't start life with an initialization function. Instead, he remains brazenly uninitialized until given a value, which he then holds steadfastly ever after—he's immutable once initialized:

class UninitializedException extends RuntimeException
class AlreadyInitializedException extends RuntimeException
class StickRef[T] {
private var value: Option[T] = None
def get: T =
synchronized {
value match {
case Some(v) => v
case None => throw new UninitializedException
}
}
def freeze (newValue: T): T =
synchronized {
value match {
case Some(_) =>
throw new AlreadyInitializedException
case None =>
value = Some(newValue)
newValue
}
}
}

Stickref clearly feels less safe than Lazy Val; using a stickref can result in an exception, whereas using a lazy val cannot (unless its initialization function throws an exception).

On the other hand, a stickref can keep mutability from getting out of hand even when there's no readily available initialization function suitable for constructing a lazy val. This might bring to mind, for example, the setting of a final instance variable in a constructor.

Minty, are you still holding on?

Construction Revisited

The whistle blows continuously and sparks fly from the great driving wheels as the engineer brakes, having spotted our heroine in her peril. It seems the heavy express train will never stop in time—but wait, who rides over the hill on a white horse? Could it be that wild fellow Stickref? It is! And quick as a flash he leaps down, undoes the ropes, and pulls Minty to him, averting disaster by a hair's breadth!

To revisit the last example from the previous post:

class T {
private final int i;
private final String s;
public T (int i, String s) {
this.i = i;
escapeFromHere(this);
this.s = s;
}
public int i { return this.i; }
public String s { return this.s; }
}

Can translating this into something with a stickref prevent access to uninitialized final instance variables?

It can. In not-quite-Java:

class T {
private final StickRef<int> i = new StickRef<>;
private final StickRef<String> s = new StickRef<>;
public T (int i, String s) {
this.i.freeze(i);
escapeFromHere(this);
this.s.freeze(s);
}
public int i { return this.i.get(); }
public String s { return this.s.get(); }
}

Unlike the solutions of the previous post, I've made no attempt to catch invalid accesses at compiletime. Instead, I've wrapped the final instance variables in something that gives a runtime error for such accesses. It does, at least, catch the accesses as soon as they occur—unlike the regular Java semantics, which don't immediately throw an exception unless the default null or zero value for the final variables triggers such an exception.

You'll notice that this solution is quite a bit simpler than my previous ideas; I don't need to generate any synthetic partially-constructed classes.

Minty-fresh constructor semantics look feasible after all, provided that you're patient enough to wait until runtime to enforce them.

But Can I Afford It?

Readers on a budget will have noticed that the indirection involved in a stickref has a cost. freezeing isn't free, as they say. I haven't tried to benchmark the difference between access to a straight instance variable and access to one wrapped in a StickRef, but I'll bet you that any VM you try it on will show that the stickref is quite a bit slower. (Let me know if there's some VM for which that's not true!)

So how can I be pushing this stickref thing as a practical solution? What kind of Scala library author spendthrift would act as if allocating additional wrapper objects all over the place had no adverse consequences?

Well, I'm not really envisioning StickRef as a standalone thing. Existing VMs probably aren't tuned to detect this particular pattern (or LazyRef, for that matter). But if you built StickRef into a VM that knew to look for it, it doesn't take very sophisticated flow analysis to see that in most real-world constructors, the stickrefs are all frozen (the final variables are all set) before the constructor returns. Seeing that, a smart VM would just discard the stickrefs and set the variables directly, which should eliminate the cost except when there's genuine uncertainty about whether an illegal access will occur.

Multitrick Pony

OK, fine, so Stickref rescues Minty. Big deal. I mean, what kind of Itanium instruction set designer madman would come up with such an elaborate Sarah Winchester architecture of quasi-mutable references just to get one little lousy effect, an error message for something hardly anyone (besides me) is actually stupid enough to do?

As it happens, stickref semantics are a perfect fit for binding variables in Scheme's letrec*. And since letrec* is more or less the archetype for initialization of mutually referential expressions, I can fairly claim that, in a multilingual environment, stickrefs are going to show up in quite a few places. And if you optimize stickrefs for minty-fresh construction, you optimize them for all those places, too.

Bring the Whole Family

Incarnations of the idea of controlled mutability start popping out of the woodwork as soon as you start looking for them. Imaginge, for example, a variation on the Java semantics that guards against changing final variables at runtime rather than compiletime. The regular Java semantics set finals exactly twice: the first time to the default value, and the second time to the constructed value. To achieve this effect you need something a little more flexible than a sticky reference: a freezable reference, which can be examined and set as many times as you like before you freeze it into immutability:

class AlreadyInitializedException extends RuntimeException
class FreezaRef[T] (private var value: T) {
private var frozen = false
def get: T = synchronized(value)
def set (newValue: T): T =
synchronized {
if (frozen)
throw new AlreadyInitializedException
else {
value = newValue
value
}
}
def freeze (newValue: T): T =
synchronized {
if (frozen)
throw new AlreadyInitializedException
else {
value = newValue
frozen = true
value
}
}
}

Beyond FreezaRef we can spot the outlines of some more exotic members of the family, such as meltable references. These references are also freezable, but they can also be unfrozen and become mutable again. Unlike ordinary mutable variables, meltable references expect to be unfrozen only infrequently—that is, the VM can assume they will be unfrozen infrequently in choosing how to optimize them. You could use something like meltable references as building blocks in a data structure holding code that can be optimized and deoptimized: when the reference to a piece of code is frozen (perhaps it's a method call whose target has been resolved), the code becomes eligible for optimization. When you melt the reference (perhaps another method has been loaded so that the original call site is no longer monomorphic), you'll need to deoptimize the code (perhaps optimizing it again later if it can be refrozen).

The common theme with all these forms of mutable references is that they're only mutable sometimes. Over large and well-defined portions of a program's execution, they're immutable. That makes it easier for people to understand how they'll behave, and easier for compilers and VMs to figure out how to optimize them by eliding the synchronization and flag checks implied by the sort of code I've shown above.

So maybe we don't have to be so afraid of mutability. Give it a bath, teach it some table manners, keep it on a short-short leash, and we can welcome it into our home like a member of the family. Only VM and language designers need lie awake at night in a cold sweat over what might happen if something goes wrong.

Meanwhile, Back at the Construction Site

Controlled immutability is all very well, but it's distracted me from my original mission. What ever happened to unifying constructors and factory methods? I still have some half-baked ideas on that, but my blogging muscles are getting kind of sore, so I'll have to save it for another time.

3 comments:

  1. next step would be to quash immutability on VMT and class inheritance tree.

    why loose time on checking the flag, that we really only need once ? It is redudant for 99,999% time.

    the very pointer to LazyRef.get should be mutated instead of mutating the flag.

    Hmm... that maybe would require subcassing of the already created object, but why not afterall ? :-)))

    ReplyDelete
  2. > more or less the archetype for initialization
    > of mutually referential expressions

    How is it done in Java, so that at least one of final reference vars got NULLified again, breaking circular reference and allowing GC to dispose the objects ?
    In GC-less languages that is not the problem, but in Java/Scala/etc having references there-and-back and then making them immutable - isn't it gonna be just heap leak?

    ReplyDelete
  3. Yes, the point is that a typical access to a tame mutant occurs in a context where you can easily prove that the initialized flag is already set, so that you can treat it as you describe -- that is, as a normal uncontrolled instance variable. I gave the code examples to describe the semantics, and also the fallback implementation for whenever the VM can't figure out that it can cheat.

    Circular references are a problem for reference-counting garbage collection algorithms, but I don't know of any Java VM implementation that uses such an algorithm -- precisely because it has difficulty dealing with cycles, and Java lets you create cyclic data structures. There are plenty of other algorithms for collecting garbage that don't have this limitation (here again, Wikipedia will be happy to enlighten you: http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29).

    ReplyDelete