Thursday, September 17, 2009

Unconstructable Token Spotted at JVM Language Summit

I've been attending the JVM Language Summit at Sun's campus in Santa Clara, not because I'm implementing a programming language on the JVM, but in my capacity as programming-language-junkie-at-large. The big fuss this year is over the invokedynamic bytecode instruction, which (unlike the other ways Java bytecode provides for invoking a method) can be retargeted at runtime.

This is (at least potentially) a great efficiency gain for implementors of dynamically dispatched languages. Such languages choose the target of a method invocation based on the runtime rather than the compiletime types of their arguments (or even on the arguments' values), and may choose methods to invoke based on method arguments other than the first (the receiver). Often they also let you modify an object's class at runtime to add or remove methods, which requires a dispatch algorithm that not only differs from Java's, but that can change over time.

The infrastructure around invokedynamic is the java.dyn package proposed for Java 7. Since I hadn't been paying much attention to invokedynamic, it took me all day yesterday to wrap my head around the magic that not only allows it to work, but often to do so very efficiently (if you're lucky enough to have the JVM inline everything that it could).

That magic derives from two sources:
  • MethodHandles, which, in their simplest form, are direct references to method bodies, and which the JVM is smart enough to inline; and
  • MethodHandles.guardWithTest, which composes a new MethdHandle from a test and two alternatives, the result of which the JVM is also smart enough to inline.


In effect, at each call site, you can inline a bunch of code (which you can change later, if you need to) that chooses an invocation target. This code will often look as if it's making a choice based on the runtime types of the method arguments originally given, but in practice, if enough gets inlined, those types will be known to the JVM when it's compiling your target-choosing code—so that the target is known at compiletime, meaning no invocation overhead at all if the target is also inlined.

Ever so sweet, if the JVM is aggressive enough about inlining. Which it sometimes isn't, but that's another topic, perhaps best left to people who know more about how the inlining decisions are made.

Anyway, what I really meant this post to be about was a spotting of the Unconstructable Token pattern in the java.dyn API, specifically in the MethodHandle constructor: this constructor is protected, because you may need to make language-specific subclasses of MethodHandle to keep track of whatever your language needs to keep track of—but you obviously can't let people construct a method handle just anywhere. Hence the Magic Token argument (currently of type sun.dyn.Access, but I assume they'll change the name to something less Sun-specific before release), which ensures that only The Authorities can can create a method handle.

I hadn't originally thought of Unconstructable Token in terms of restricting where user-supplied subclasses of framework classes can be allocated, but this is a fine use of it, and Java modules will do nothing to make this use of the pattern obsolete.

Thursday, September 10, 2009

Immutable, Keyable, Canonical

I've been trying to come up for a while with some general-purpose mechanisms for creating canonicalized classes. I ran into trouble trying to formulate the rules, and I realize now that was because I was trying to conflate three distinct concepts into one.

Canonicalization (also called “interning”) is a well-known design pattern; see, for example, the Wikipedia article on interning of strings. A canonical object must have a well-defined immutable identity (which I am calling the object's “key”, by analogy with the use of the term “primary key” in referring to relational databases), and an object with a given identity is stored just once. Canonicalization is useful for simplifying comparisons between objects (their references can be compared, instead of their contents) and also for saving space in applications where many identical objects might otherwise be allocated.

My realization was that I had been confusing the idea of an immutable object with the idea of an object with an immutable identity. Besides being candidates for canonicalization, objects with an immutable identity are also suitable as map keys, because two such objects, when equal, always look up the same value in a map. However, an object with an immutable identity need not be completely immutable. Consider, for example, this class:


final class C {

private final String s;

private int n;


C (String s) {
assert s != null;
this.s = s;
}


public boolean equals (Object o) {
if (o == this)
return true;
if (! (o instanceof C))
return false;
return ((C) o).s.equals(s);
}


public int hashCode () {
return s.hashCode();
}


public String toString () {
return s;
}


public synchronized int n () {
return n;
}


public synchronized void n (int n) {
this.n = n;
}

}


This class has an immutable identity, specified by the value of the final variable s: s is the only field examined by equals and hashCode, and so instances of this class are perfectly suitable for use as map keys. Nonetheless, this class also contains some mutable state, in the form of the variable i.

I'm not the only one to mix up the notions of immutable object and immutable identity. The javadoc for java.util.Map says:


Note: great care must be exercised if mutable objects are used as map keys....


the implication being that only completely immutable objects are suitable keys. The paragraph in question does go on to mention the equals and hashCode methods specifically, so the documentation is not incorrect; still, many other authors have carelessly used “immutable object” and “suitable for use as a map key” as synonymous.

In this post, then, I want to come up with usable definitions of these three concepts:


  • Immutable: Applied to an object, it means that all instance variables of the object are final, and all the object's instance variables of reference type are null or references to immutable objects; or, the object is an instance of a system class known to be immutable, like String.

    An immutable class is a class all of whose instances are known to be immutable; specifically, it is either a system class known to be immutable, or:


    • All instance variables of the class are final; and

    • All instance variables of the class are of an immutable type, where an immutable type is a primitive type or a reference to an immutable class; and

    • All subclasses of the class are also immutable.


    A possibly immutable class is a class that might have both mutable and immutable instances (because even though all its instance variables are declared final and of possibly immutable types, the class, or the class of a reference reachable through one of its instance variables, might have a subclass with non-final variables).

    Note that these definitions do not permit an array to be considered immutable, because Java does not have a way to declare an array element final.

  • Keyable: Applied to an object, it means that the object meets the usual Java criteria for a suitable key in a hashed data structure. Specifically, the object may be an instance of a system class known to be keyable, such as String; but if it is not an instance of such a class, then, for an instance a of class C and some subset K (the “key fields”) of C's instance variables, a is keyable if and only if:


    • a.equals(b) returns false if b is null or of a class other than C. Otherwise, it returns true if and only if all corresponding variables in K are equal.

    • a.hashCode() returns a deterministic value based only on the values of a's instance variables in K (the value returned is not affected by the values of instance variables of any other object, or of any class's static variables).

    • All variables in K are final, and if of a reference type, either are null or refer to keyable objects.


    A class C is a keyable class if every instance of the class is guranteed to be a keyable object. This requires that:


    • C's equals and hashCode behave as described above for keyable objects; and

    • All subclasses of C be keyable classes; and

    • All of C's key fields be final and of a keyable type, where a keyable type is a primitive type or a reference to a keyable class.


    A possibly keyable class is a class that might have both keyable and nonkeyable instances.

  • Canonical: Applied to a class, it means that the class is keyable and that some mechanism is used to avoid creating (except perhaps transiently) more than one instance of the class with the same key.

    An instance of a canonical class is called a canonical object. A canonical object a is necessarily the only one with its key, meaning that a.equals(b) is equivalent to a == b for any possible b.


The discussion above doesn't mention static variables, which clearly are not part of an individual object's state, and so can be ignored when deciding whether an object meets any of the above definitions. More problematic are transient variables—although these are treated for the most part as ordinary instance variables, they are not serialized and so are not part of an object's state when it is saved and restored (possibly in another process). I am inclined now to exclude transient variables from consideration when applying the above definitions to the facilities I'm working on.

Thursday, September 3, 2009

Reflecting Darkly on Generics

Although you can do reflection on generic types in Java, the information available from java.lang.Class and the java.lang.reflect classes is pretty raw.

I came across a situation the other day where I had something like:


interface I<T> {
T foo ();
}

abstract class C implements I<String> {
}


and I wanted to ask: what does C.foo() return? The answer, obvious to a human reader, is String, but deducing this using reflection requires traversing C's ancestry and explicitly resolving the type variable T.

It would be nice if there were some general-purpose facilities built into Java to answer questions like:


  • If a given Method m is invoked on an object declared to be of type T, what are the type bounds on the returned value?

  • If a given Method m that takes n arguments is invoked an object declared to be of type T with arguments of specified types Ui, but omitting one argument k, so that 0 ≤ i < k and k < i < n, what are the type bounds on argument k?


Without having made any attempt to write any such facilities myself, I can't say exactly how hard they would be to write, but my feeling is that they would not be easy.

But, if that's not bad enough, they might be impossible to write in a definitive way consistent with the Java language specification—see, for example, this article, which says that the Java generic type inference algorithm is underspecified.