runtime generics in an erasure world

as we already know, generics in java are a compile time concept to help enforce type safety. during compilation, type erasure kicks in, resulting in the underlying bytecode being free of any generics information.

sometimes, however, we need generics information at runtime (such as when we need to convert a json string into its object form, for example). i was curious, how does this work given that types are erased at compile time? in other words, how does gson’s TypeToken class work?

in other words, why does doing this work when there are no generics at runtime (especially when, instead of String, the object type is a custom data object, for example?)

final Type typeToken = new TypeToken<List<String>>(){}.getType();
final String json = "[\"one\", \"two\"]";
final List<String> items = new Gson().fromJson(json, typeToken);

the tldr;

this great answer on StackOverflow answers the question nicely.

in summary, the java language spec specifies what the erased type of parameterized types, nested types, array types, and type variables is. it then says that “the erasure of every other type is the type itself.” TypeToken uses this fact to maintain generics information. as the TypeToken class’s javadoc says:

Forces clients to create a subclass of this class which enables retrieval the type information even at runtime.

stepping back

stepping back a bit, it’s pretty phenomenal seeing the effects of type erasure on bytecode directly. consider these two classes:

import java.util.List;

public class WithGenerics {
   List<String> data;
}

and

import java.util.List;

public class WithoutGenerics {
   List data;
}

if we compile these via javac and then look at the bytecode (using javap -v or using classyshark-bytecode-viewer), we’ll see:

notice that the bytecode is exactly the same for both classes. the only exception is that the type information is present in the signature of the WithGenerics class. if we are to run javap -v, we’ll see that this signature references the constant pool, where the type actually is.

{
  java.util.List<java.lang.String> data;
    descriptor: Ljava/util/List;
    Signature: #7    // Ljava/util/List<Ljava/lang/String;>;
}

in contrast, looking at WithoutGenerics, we’d see:

{
  java.util.List data;
    descriptor: Ljava/util/List;
}

“the erasure of every other type”

let’s take another example -

public class InnerType {
   public static class Internal<T> {}

   public static void main(String[] args) {
   }
}

after running javac, we end up with two classes - InnerType.class and InnerType$Internal.class. looking at InnerType$Internal.class via javap -v, we see the class defined as:

public class InnerType$Internal<T extends java.lang.Object> extends java.lang.Object

if we try to display the class information like this:

public class InnerType {
   public static class Internal<T> {}

   public static void main(String[] args) {
      Internal<String> internal = new Internal<>();
      Class<?> classType = internal.getClass();

      System.out.println(classType + ", " + classType.getGenericSuperclass());
   }
}

we get InnerType$Internal, with a superclass of java.lang.Object. now let’s try to modify the example slightly, and create an anonymous subclass of Internal, by doing this:

Internal<String> internal = new Internal<String>(){
   /* we could override methods here if we wanted to */
};

by just making that change, the app now writes that the class is InnerType$1, with a generic superclass of InnerType.InnerType$Internal<java.lang.String>. this generic superclass is actually a parameterized type, so we can cast it and extract extra information by doing something like this:

ParameterizedType t =
   (ParameterizedType) classType.getGenericSuperclass();
System.out.println(t.getOwnerType() + ", " + t.getRawType() + ", " +
   Arrays.toString(t.getActualTypeArguments()));

if we run this, we now get an owner type of InnerType, a raw type of InnerType$Internal, and the actual type arguments of java.lang.String.

what about TypeToken?

if we look back at the first Gson example, we notice the use of a TypeToken class provided by Gson. what does this class do? we care about two classes here, TypeToken, and $Gson$Types. looking at the constructor for TypeToken, we can see it does 3 things:

  1. calls a canonicalize method on the type
  2. gets the raw type
  3. calculates a hashcode

most importantly, the canonicalize method exists in $Gson$Types and returns a specific Type depending on the actual Type passed in - if it’s an array, for example, a GenericArrayTypeImpl is made. in the example above, a ParameterizedTypeImpl would be made, using the owner type, the raw type, and the actual arguments.

in this case, as callers of Gson’s api, we make a new TypeToken with our generic type parameters. internally, this generates a ParameterizedTypeImpl that can then be used within Gson to do the right thing during deserialization.

summary

in summary, whereas erasure erases generic types at compile time, libraries like gson take advantage of the fact that some types erase to themselves to have access to the generic type at runtime.