Immutability's Evil Stepsiblings
We all know that Mutability Is Bad. Bad to the bone. The language construct that dare not speak its name (at least in some languages). Or at any rate, those of us with a properly strict functional upbringing know this; there are still heathens out there who write Java (or even Scala or C++) code containing single equal signs without the least hint of shame.
Shudder.
Not talked about in the Best Families is the fact that the languages we all know and love are inclined, in the depths of night with the shades drawn, to treat the contents of memory as a big mutable array whose contents they can toy with as they please. An even darker secret is that the designers of libraries for ostensibly immutable data structures sometimes hide mutable variables in their cellars (at least temporarily, like Scala
Builder
s).Of course, there have been attempts to dress mutability up in civilized clothing and bring it into Polite Society. We've seen enough of Haskell's by-need evaluation and Scala's lazy vals that they rarely elicit anything more than mild titters.
Dare we ask ourselves now if there any more forms of mutability out there, longing to discard their black hats and shave their twirly mustaches, ready to submit themselves to the conventions of decent society and become upstanding, productive citizens?
We dare.
Lazy References
Lazy vals and by-need parameters can be understood as the combination of a mutable reference cell, a flag, and a nullary function. The flag tells whether the cell has been initialized; if set, the cell can be examined. If not set, any attempt to examine the cell invokes the function, puts the resulting value in the cell, sets the flag, and returns the value:
class LazyRef[T] (private var f: () => T) {
private var initialized = false
private var value: T = null.asInstanceOf[T]
def get: T =
synchronized {
if (! initialized) {
value = f()
initialized = true
f = null // lest f become garbage
}
value
}
}
There's mutability here, all right, but it's a very disciplined sort of mutability that can mingle with the inhabits of Immutableland without causing the least alarm.
value
can be changed, but only once, and you can never observe it before it gets changed, so it's as good as immutable if you overlook the peculiar circumstances of its birth—er, conventions of its initialization.Compulsive optimizers may observe that a straightforward implementation of
LazyRef
can take more space than necessary. At any given moment, either value
or f
is of interest, but never both at the same time. You might reimagine LazyRef
as:
class LazyRef[T] (private var funOrVal: Either[() => T,T]) {
def get: T =
synchronized {
funOrVal match {
case Right(v) =>
v
case Left(f) =>
val v = f()
funOrVal = Right(v)
v
}
}
}
Whether this version of
LazyRef
is any faster or more compact than the previous version depends on your execution environment. The point is, though, that because this construct is so useful, you're likely to make heavy use of it, and attention (from the language or VM) to make it more efficient might be well repaid.Sticky References
Lazy Val has a more rakish cousin I call a sticky reference, or stickref for short. Stickref is mostly safe around Respectable Society, but you'll want to ensure that he's properly chaperoned whenever he's in the presence of impressionable young algorithms.
Stickref differs from Lazy Val in that he doesn't start life with an initialization function. Instead, he remains brazenly uninitialized until given a value, which he then holds steadfastly ever after—he's immutable once initialized:
class UninitializedException extends RuntimeException
class AlreadyInitializedException extends RuntimeException
class StickRef[T] {
private var value: Option[T] = None
def get: T =
synchronized {
value match {
case Some(v) => v
case None => throw new UninitializedException
}
}
def freeze (newValue: T): T =
synchronized {
value match {
case Some(_) =>
throw new AlreadyInitializedException
case None =>
value = Some(newValue)
newValue
}
}
}
Stickref clearly feels less safe than Lazy Val; using a stickref can result in an exception, whereas using a lazy val cannot (unless its initialization function throws an exception).
On the other hand, a stickref can keep mutability from getting out of hand even when there's no readily available initialization function suitable for constructing a lazy val. This might bring to mind, for example, the setting of a
final
instance variable in a constructor.Minty, are you still holding on?
Construction Revisited
The whistle blows continuously and sparks fly from the great driving wheels as the engineer brakes, having spotted our heroine in her peril. It seems the heavy express train will never stop in time—but wait, who rides over the hill on a white horse? Could it be that wild fellow Stickref? It is! And quick as a flash he leaps down, undoes the ropes, and pulls Minty to him, averting disaster by a hair's breadth!
To revisit the last example from the previous post:
class T {
private final int i;
private final String s;
public T (int i, String s) {
this.i = i;
escapeFromHere(this);
this.s = s;
}
public int i { return this.i; }
public String s { return this.s; }
}
Can translating this into something with a stickref prevent access to uninitialized
final
instance variables?It can. In not-quite-Java:
class T {
private final StickRef<int> i = new StickRef<>;
private final StickRef<String> s = new StickRef<>;
public T (int i, String s) {
this.i.freeze(i);
escapeFromHere(this);
this.s.freeze(s);
}
public int i { return this.i.get(); }
public String s { return this.s.get(); }
}
Unlike the solutions of the previous post, I've made no attempt to catch invalid accesses at compiletime. Instead, I've wrapped the
final
instance variables in something that gives a runtime error for such accesses. It does, at least, catch the accesses as soon as they occur—unlike the regular Java semantics, which don't immediately throw an exception unless the default null
or zero value for the final
variables triggers such an exception.You'll notice that this solution is quite a bit simpler than my previous ideas; I don't need to generate any synthetic partially-constructed classes.
Minty-fresh constructor semantics look feasible after all, provided that you're patient enough to wait until runtime to enforce them.
But Can I Afford It?
Readers on a budget will have noticed that the indirection involved in a stickref has a cost.
freeze
ing isn't free, as they say. I haven't tried to benchmark the difference between access to a straight instance variable and access to one wrapped in a StickRef
, but I'll bet you that any VM you try it on will show that the stickref is quite a bit slower. (Let me know if there's some VM for which that's not true!)So how can I be pushing this stickref thing as a practical solution? What kind of
Well, I'm not really envisioning
StickRef
as a standalone thing. Existing VMs probably aren't tuned to detect this particular pattern (or LazyRef
, for that matter). But if you built StickRef
into a VM that knew to look for it, it doesn't take very sophisticated flow analysis to see that in most real-world constructors, the stickrefs are all frozen (the final
variables are all set) before the constructor returns. Seeing that, a smart VM would just discard the stickrefs and set the variables directly, which should eliminate the cost except when there's genuine uncertainty about whether an illegal access will occur.Multitrick Pony
OK, fine, so Stickref rescues Minty. Big deal. I mean, what kind of
As it happens, stickref semantics are a perfect fit for binding variables in Scheme's
letrec*
. And since letrec*
is more or less the archetype for initialization of mutually referential expressions, I can fairly claim that, in a multilingual environment, stickrefs are going to show up in quite a few places. And if you optimize stickrefs for minty-fresh construction, you optimize them for all those places, too.Bring the Whole Family
Incarnations of the idea of controlled mutability start popping out of the woodwork as soon as you start looking for them. Imaginge, for example, a variation on the Java semantics that guards against changing
final
variables at runtime rather than compiletime. The regular Java semantics set final
s exactly twice: the first time to the default value, and the second time to the constructed value. To achieve this effect you need something a little more flexible than a sticky reference: a freezable reference, which can be examined and set as many times as you like before you freeze it into immutability:
class AlreadyInitializedException extends RuntimeException
class FreezaRef[T] (private var value: T) {
private var frozen = false
def get: T = synchronized(value)
def set (newValue: T): T =
synchronized {
if (frozen)
throw new AlreadyInitializedException
else {
value = newValue
value
}
}
def freeze (newValue: T): T =
synchronized {
if (frozen)
throw new AlreadyInitializedException
else {
value = newValue
frozen = true
value
}
}
}
Beyond
FreezaRef
we can spot the outlines of some more exotic members of the family, such as meltable references. These references are also freezable, but they can also be unfrozen and become mutable again. Unlike ordinary mutable variables, meltable references expect to be unfrozen only infrequently—that is, the VM can assume they will be unfrozen infrequently in choosing how to optimize them. You could use something like meltable references as building blocks in a data structure holding code that can be optimized and deoptimized: when the reference to a piece of code is frozen (perhaps it's a method call whose target has been resolved), the code becomes eligible for optimization. When you melt the reference (perhaps another method has been loaded so that the original call site is no longer monomorphic), you'll need to deoptimize the code (perhaps optimizing it again later if it can be refrozen).The common theme with all these forms of mutable references is that they're only mutable sometimes. Over large and well-defined portions of a program's execution, they're immutable. That makes it easier for people to understand how they'll behave, and easier for compilers and VMs to figure out how to optimize them by eliding the synchronization and flag checks implied by the sort of code I've shown above.
So maybe we don't have to be so afraid of mutability. Give it a bath, teach it some table manners, keep it on a short-short leash, and we can welcome it into our home like a member of the family. Only VM and language designers need lie awake at night in a cold sweat over what might happen if something goes wrong.
Meanwhile, Back at the Construction Site
Controlled immutability is all very well, but it's distracted me from my original mission. What ever happened to unifying constructors and factory methods? I still have some half-baked ideas on that, but my blogging muscles are getting kind of sore, so I'll have to save it for another time.
next step would be to quash immutability on VMT and class inheritance tree.
ReplyDeletewhy loose time on checking the flag, that we really only need once ? It is redudant for 99,999% time.
the very pointer to LazyRef.get should be mutated instead of mutating the flag.
Hmm... that maybe would require subcassing of the already created object, but why not afterall ? :-)))
> more or less the archetype for initialization
ReplyDelete> of mutually referential expressions
How is it done in Java, so that at least one of final reference vars got NULLified again, breaking circular reference and allowing GC to dispose the objects ?
In GC-less languages that is not the problem, but in Java/Scala/etc having references there-and-back and then making them immutable - isn't it gonna be just heap leak?
Yes, the point is that a typical access to a tame mutant occurs in a context where you can easily prove that the initialized flag is already set, so that you can treat it as you describe -- that is, as a normal uncontrolled instance variable. I gave the code examples to describe the semantics, and also the fallback implementation for whenever the VM can't figure out that it can cheat.
ReplyDeleteCircular references are a problem for reference-counting garbage collection algorithms, but I don't know of any Java VM implementation that uses such an algorithm -- precisely because it has difficulty dealing with cycles, and Java lets you create cyclic data structures. There are plenty of other algorithms for collecting garbage that don't have this limitation (here again, Wikipedia will be happy to enlighten you: http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29).