runtime generics in an erasure world
Nov 9, 2017 · 4 minute readcode
android
as we already know, generics in java are a compile time concept to help enforce type safety. during compilation, type erasure kicks in, resulting in the underlying bytecode being free of any generics information.
sometimes, however, we need generics information at runtime (such as when we
need to convert a json string into its object form, for example). i was
curious, how does this work given that types are erased at compile time? in
other words, how does gson’s TypeToken
class work?
in other words, why does doing this work when there are no generics at runtime (especially when, instead of String, the object type is a custom data object, for example?)
final Type typeToken = new TypeToken<List<String>>(){}.getType();
final String json = "[\"one\", \"two\"]";
final List<String> items = new Gson().fromJson(json, typeToken);
the tldr;
this great answer on StackOverflow answers the question nicely.
in summary, the java language spec specifies what the erased type of
parameterized types, nested types, array types, and type variables is. it
then says that “the erasure of every other type is the type itself.”
TypeToken
uses this fact to maintain generics information. as the TypeToken
class’s javadoc says:
Forces clients to create a subclass of this class which enables retrieval the type information even at runtime.
stepping back
stepping back a bit, it’s pretty phenomenal seeing the effects of type erasure on bytecode directly. consider these two classes:
import java.util.List;
public class WithGenerics {
List<String> data;
}
and
import java.util.List;
public class WithoutGenerics {
List data;
}
if we compile these via javac
and then look at the bytecode (using
javap -v
or using classyshark-bytecode-viewer), we’ll see:
notice that the bytecode is exactly the same for both classes. the only
exception is that the type information is present in the signature of the
WithGenerics
class. if we are to run javap -v
, we’ll see that this
signature references the constant pool, where the type actually is.
{
java.util.List<java.lang.String> data;
descriptor: Ljava/util/List;
Signature: #7 // Ljava/util/List<Ljava/lang/String;>;
}
in contrast, looking at WithoutGenerics
, we’d see:
{
java.util.List data;
descriptor: Ljava/util/List;
}
“the erasure of every other type”
let’s take another example -
public class InnerType {
public static class Internal<T> {}
public static void main(String[] args) {
}
}
after running javac, we end up with two classes - InnerType.class
and InnerType$Internal.class
. looking at InnerType$Internal.class
via javap -v
, we see the class defined as:
public class InnerType$Internal<T extends java.lang.Object> extends java.lang.Object
if we try to display the class information like this:
public class InnerType {
public static class Internal<T> {}
public static void main(String[] args) {
Internal<String> internal = new Internal<>();
Class<?> classType = internal.getClass();
System.out.println(classType + ", " + classType.getGenericSuperclass());
}
}
we get InnerType$Internal
, with a superclass of java.lang.Object
. now let’s try to modify the example slightly, and create an anonymous subclass of Internal
, by doing this:
Internal<String> internal = new Internal<String>(){
/* we could override methods here if we wanted to */
};
by just making that change, the app now writes that the class is InnerType$1
, with a generic superclass of InnerType.InnerType$Internal<java.lang.String>
. this generic superclass is actually a parameterized type, so we can cast it and extract extra information by doing something like this:
ParameterizedType t =
(ParameterizedType) classType.getGenericSuperclass();
System.out.println(t.getOwnerType() + ", " + t.getRawType() + ", " +
Arrays.toString(t.getActualTypeArguments()));
if we run this, we now get an owner type of InnerType
, a raw type of InnerType$Internal
, and the actual type arguments of java.lang.String
.
what about TypeToken?
if we look back at the first Gson example, we notice the use of a
TypeToken
class provided by Gson. what does this class do? we care about two
classes here, TypeToken, and $Gson$Types. looking at the constructor
for TypeToken
, we can see it does 3 things:
- calls a canonicalize method on the type
- gets the raw type
- calculates a hashcode
most importantly, the canonicalize method exists in $Gson$Types and returns
a specific Type
depending on the actual Type
passed in - if it’s an array,
for example, a GenericArrayTypeImpl
is made. in the example above, a
ParameterizedTypeImpl
would be made, using the owner type, the raw
type, and the actual arguments.
in this case, as callers of Gson’s api, we make a new TypeToken
with our
generic type parameters. internally, this generates a ParameterizedTypeImpl
that can then be used within Gson to do the right thing during
deserialization.
summary
in summary, whereas erasure erases generic types at compile time, libraries like gson take advantage of the fact that some types erase to themselves to have access to the generic type at runtime.