I've been trying to come up for a while with some general-purpose mechanisms for creating canonicalized classes. I ran into trouble trying to formulate the rules, and I realize now that was because I was trying to conflate three distinct concepts into one.
Canonicalization (also called “interning”) is a well-known design pattern; see, for example, the
Wikipedia article on interning of strings. A canonical object must have a well-defined immutable identity (which I am calling the object's “key”, by analogy with the use of the term “primary key” in referring to
relational databases), and an object with a given identity is stored just once. Canonicalization is useful for simplifying comparisons between objects (their references can be compared, instead of their contents) and also for saving space in applications where many identical objects might otherwise be allocated.
My realization was that I had been confusing the idea of an immutable object with the idea of an object with an immutable identity. Besides being candidates for canonicalization, objects with an immutable identity are also suitable as map keys, because two such objects, when equal, always look up the same value in a map. However, an object with an immutable identity need not be completely immutable. Consider, for example, this class:
final class C {
private final String s;
private int n;
C (String s) {
assert s != null;
this.s = s;
}
public boolean equals (Object o) {
if (o == this)
return true;
if (! (o instanceof C))
return false;
return ((C) o).s.equals(s);
}
public int hashCode () {
return s.hashCode();
}
public String toString () {
return s;
}
public synchronized int n () {
return n;
}
public synchronized void n (int n) {
this.n = n;
}
}
This class has an immutable identity, specified by the value of the
final
variable
s
:
s
is the only field examined by
equals
and
hashCode
, and so instances of this class are perfectly suitable for use as map keys. Nonetheless, this class also contains some mutable state, in the form of the variable
i
.
I'm not the only one to mix up the notions of immutable object and immutable identity. The
javadoc for java.util.Map
says:
Note: great care must be exercised if mutable objects are used as map keys....
the implication being that only completely immutable objects are suitable keys. The paragraph in question does go on to mention the
equals
and
hashCode
methods specifically, so the documentation is not incorrect; still, many other authors have carelessly used “immutable object” and “suitable for use as a map key” as synonymous.
In this post, then, I want to come up with usable definitions of these three concepts:
- Immutable: Applied to an object, it means that all instance variables of the object are
final
, and all the object's instance variables of reference type are null
or references to immutable objects; or, the object is an instance of a system class known to be immutable, like String
.
An immutable class is a class all of whose instances are known to be immutable; specifically, it is either a system class known to be immutable, or:
- All instance variables of the class are
final
; and
- All instance variables of the class are of an immutable type, where an immutable type is a primitive type or a reference to an immutable class; and
- All subclasses of the class are also immutable.
A possibly immutable class is a class that might have both mutable and immutable instances (because even though all its instance variables are declared final
and of possibly immutable types, the class, or the class of a reference reachable through one of its instance variables, might have a subclass with non-final
variables).
Note that these definitions do not permit an array to be considered immutable, because Java does not have a way to declare an array element final
.
- Keyable: Applied to an object, it means that the object meets the usual Java criteria for a suitable key in a hashed data structure. Specifically, the object may be an instance of a system class known to be keyable, such as
String
; but if it is not an instance of such a class, then, for an instance a
of class C
and some subset K (the “key fields”) of C
's instance variables, a
is keyable if and only if:
a.equals(b)
returns false
if b
is null
or of a class other than C
. Otherwise, it returns true
if and only if all corresponding variables in K are equal.
a.hashCode()
returns a deterministic value based only on the values of a
's instance variables in K (the value returned is not affected by the values of instance variables of any other object, or of any class's static
variables).
- All variables in K are
final
, and if of a reference type, either are null
or refer to keyable objects.
A class C
is a keyable class if every instance of the class is guranteed to be a keyable object. This requires that:
C
's equals
and hashCode
behave as described above for keyable objects; and
- All subclasses of
C
be keyable classes; and
- All of
C
's key fields be final
and of a keyable type, where a keyable type is a primitive type or a reference to a keyable class.
A possibly keyable class is a class that might have both keyable and nonkeyable instances.
- Canonical: Applied to a class, it means that the class is keyable and that some mechanism is used to avoid creating (except perhaps transiently) more than one instance of the class with the same key.
An instance of a canonical class is called a canonical object. A canonical object a
is necessarily the only one with its key, meaning that a.equals(b)
is equivalent to a == b
for any possible b
.
The discussion above doesn't mention
static
variables, which clearly are not part of an individual object's state, and so can be ignored when deciding whether an object meets any of the above definitions. More problematic are
transient
variables—although these are treated for the most part as ordinary instance variables, they are not serialized and so are not part of an object's state when it is saved and restored (possibly in another process). I am inclined now to exclude
transient
variables from consideration when applying the above definitions to the facilities I'm working on.